Specific embodiment
For ease of the elaboration being purged to the technical scheme of the application, the technical scheme of the application is being introducedBefore, it is introduced first against some contents in current data warehouse:
(1) table
Table is the most important ingredient of data warehouse.One table record is measured, attribute data group by keyInto (such as employee's table is made up of employee number (key), employee name, age etc. employee's attribute data).In the technical scheme of the application, there is the table of following two types in the construction of data warehouse:
Increment list:It is in order to improve performance, (general according to record change timestamp field for large data volume tableIt is gmt_modify) adopt increment synchronization, increment list each snapshot retains a incremental data, table name sideFormula is tablename_ { yyyymmdd } _ delta or tablename_delta (subregion field dt=yyyymmdd);
Full dose table:Each snapshot can retain a full dose table, and the full dose table can be from storage facility located at processing plant full doseSynchronously come, or by the incremental data synchronously come from storage facility located at processing plant with full dose table snapshot number yesterdayAccording to carrying out after full outer join, retain a newest full dose, the structure of full dose table is consistent with increment list, tableNaming method is tablename_ { yyyymmdd } or tablename (subregion field dt=yyyymmdd).
(2) view
View is a Virtual table, and its content is by query-defined.The same with real table, view includes oneThe denominative columns and rows data of serial band.It should be noted that view is not in data base with storageData value collection form is present.
(3) metadata
Metadata is the data for describing data, is in the nature the descriptive information to data and information resources, is wrappedInclude traffic table structural information, data warehouse table structural information etc..Wherein relatively attach most importance in Construction of Data WarehouseWant have scheduling metadata, SQL execution journal metadata, table structural metadata, synchronous center metadata,Timed task metadata etc..
(4) Merge tasks
Merge tasks are mode indispensable in Construction of Data Warehouse, its role is to increment listData are merged with snapshot data yesterday of full dose table, generate a newest full dose snapshot table data.
(5) synchronous center
Synchronous center is that creation data is synchronized to data warehouse or data warehouse data is flowed back to give production systemThe device or equipment of system.
Based on the above and the background technology of the application, in Construction of Data Warehouse or restructuring procedure,Framework is built according to data model needs to set up basal layer, some incremental datas synchronously come from storage facility located at processing plantTable needs the ODS lamination a full dose table in data warehouse, to retain a newest full dose snapshot data,And this process be related to generate full dose table build table statement, merge mission scripts, scheduling rely on, data it is initialThe operations such as change, issue.And the technical scheme of the application is then intended to realize Mass production base by merge tasksPlinth layer data table (full dose table), so that, on the premise of safeguard work quality, reduce Construction of Data WarehouseComplexity, lifted development efficiency.
As shown in figure 1, for the application propose a kind of data table generating method schematic flow sheet, including withLower step:
S101, generates current number according to the structural metadata information of increment list and default task templateAccording to table task.
As stated in the Background Art, in existing database, the full dose table construction demand of basal layer is more and configures loaded down with trivial details,Therefore based on the application is directed to type in a preferred embodiment, the tables of data of layer full dose table is generatedProcess.For the production operation that subsequently can rapidly carry out tables of data, technical staff can be in the stepCarry out the initialization operation of uniform data table generation before, i.e., according to the structural metadata information generate withThe tables of data is corresponding to build table statement and the data initialization script.
In specific application scenarios, incremental data table naming method is:Ods_ { origin system table name } _ delta;And the naming method of full dose tables of data is:Ods_ { origin system table name }.
Due to existing basal layer full dose table major part by full dose table merge tasks realizing, thereforeIn one preferred embodiment of the application, task template can be specifically configured to merge and be appointed by technical staffBusiness template, and in this step according to increment list structural metadata information and batch merge task templates intoMerge task codes, are subsequently uploaded to default generation using merge task codes as the tables of data taskCode storehouse.In specific application scenarios, Mass production basal layer full dose table merge task templates form is such asShown in table 1 below:
Table 1
Further, since the method for building up and device of data warehouse ODS layer full dose tables are one highly integratedScheme, wherein relate to metadata, beyond the clouds code library, scheduling dependence, data merge task, numberAccording to initialization, issue a series of processes such as reach the standard grade.Therefore in order to further improve automatization level andTreatment effeciency, can generate in the structural metadata information according to increment list and default task template and work asStructural metadata information before front tables of data task for increment list carries out pretreatment.Specifically, existIn one preferred embodiment of the application, increment list is synchronized to into data warehouse before this step, and is obtainedThe structural metadata information of increment list is taken, the unit of the table structural information that data warehouse can be provided is set up with thisData, services, are easy to follow-up process.
As shown in Fig. 2 the flow process of the data table generating method proposed by the application specific embodiment is illustratedFigure, in the figure in addition to comprising synchronous center module described above, meta data block, further relates toTo with lower module:
Intermediate layer:Carry out integrating the one of precipitation using the synchronous base layer data for coming in Construction of Data WarehouseIndividual data Layer, it is therefore intended that facilitate subsequent applications to use business datum;
Beyond the clouds:The IDE of data warehouse exploitation, by carrying out data syn-chronization beyond the cloudsConfiguration, modelling, ETL exploitations, unit testing, task issue, O&M etc. are operated.
Scheduling:Data warehouse task is carried out having adjusted according to configuration automatically the system for performing.
Based on above-mentioned module, the specific embodiment performs following steps in early stage:
Step 1:By synchronous central synchronous incremental data to data warehouse;
Step 2.1:Meta data block generates full dose table according to increment list structural information and builds table statement, and by itsSend to module beyond the clouds;
Step 2.2:Meta data block is according to template and increment list structural generation Merge tasks.
In above-mentioned steps, the name form of incremental data table is:Ods_ { origin system table name } _ delta;EntirelyAmount tables of data name form be:Ods_ { origin system table name }.Modules table is set up with this andInquiry.During synchronous central synchronous, the name form of synchronous task is:Imp_ { the tables of ODPSName };The name form of merging task is:Mrg_ { table name of ODPS }.
As meta data block can provide the table structural information of data warehouse, and code library module beyond the cloudsIn merge task generations, can be generated according to increment list structural metadata information and batch merge task templatesCode is simultaneously preserved by code library service commitment code library beyond the clouds, therefore complete according to Mass production basal layerAfter scale merge task templates safeguard information, you can will be uploaded to by Excel file and be exclusively used in dataThe system that table is generated, and by calling meta data block and generating basal layer according to the structural information of increment listFull dose table build table statement and data initializtion script, call beyond the clouds code library module generate Merge appointBusiness.
S102, configures to the schedule information of the tables of data task according to the task template.
The early-stage preparations generated for tables of data, the step are completed by the process of S101 and S102Suddenly it is mainly used in being configured for contents such as the node required for concrete generation, scripts.In order to realize pinThe scheduling configuration service generated to tables of data, in the application preferred embodiment, this step is according to describedMerge task templates generate preposition dependence node corresponding with the tables of data task, task output name, tuneDegree task baseline, scheduler task owner.
In the specific embodiment of Fig. 2, after it have received Merge tasks by step 2.2, beyond the cloudsModule configures relevant information according to template configuration schedule information for Merge tasks in step 2.3, and callsScheduling configuration service generates scheduling dependence and merge tasks export title.By taking the content shown in table 1 as an example,According to batch merge task templates in the specific embodiment, call scheduling configuration service to generate merge and appointThe preposition dependence node of business, task output name, scheduler task baseline, scheduler task owner.
S103, performs according to the tables of data task and the schedule information and builds table statement and initializationScript, to generate tables of data.
After generating tables of data task by S101 and finishing schedule information by S102 configurations,Scheduler module indicates that system will build table statement, initializtion script, the packing of merge tasks concurrently in step 3Cloth is on line.Subsequently module is performed by step 4.1 instruction system and builds table statement beyond the clouds, is performed in systemAfter building table statement and initializtion script, module carries out initialization data process by step 4.2 beyond the clouds.
By above flow process, quickly generating and creating for tables of data is realized, traditional craft is instead ofTable statement, hand-coding basal layer full dose table merge tasks, manual configuration scheduling dependence, handss are built in preparationWork writes data initialization script, manual packing, issues by hand, performs by hand and build table statement and initializationScript etc. need the work that is accomplished manually, realize automatically generating for tables of data, improve data warehouseWork efficiency.
For reaching above technical purpose, the application also proposed a kind of tables of data and generate equipment, as shown in Fig. 2Including:
Generation module 210, gives birth to for the structural metadata information according to increment list and default task templateInto current tables of data task;
Configuration module 220, for carrying out to the schedule information of the tables of data task according to the task templateConfiguration;
Performing module 230, builds table statement for performing according to the tables of data task and the schedule informationAnd initializtion script, to generate tables of data.
In specific application scenarios, the tables of data is specially basal layer full dose table, also includes:
Initialization module, for generating build corresponding with the tables of data according to the structural metadata informationTable statement and the data initialization script.
In specific application scenarios, also include:
Synchronization module, for the increment list is synchronized to data warehouse, and obtains the institute of the increment listState structural metadata information.
In specific application scenarios, the generation module specifically for:
According to increment list structural metadata information with batch merge task templates into merge task codes, andDefault code library is uploaded to using the merge task codes as the tables of data task.
In specific application scenarios, the configuration module specifically for:
Generate preposition dependence node corresponding with the tables of data task, appoint according to the merge task templatesBusiness output name, scheduler task baseline, scheduler task owner.
Through the above description of the embodiments, those skilled in the art can be understood that this ShenPlease be realized by hardware, it is also possible to by software plus necessary general hardware platform mode realizing.Based on such understanding, the technical scheme of the application can be embodied in the form of software product, and this is softIt (can be CD-ROM, USB flash disk, movement are hard that part product can be stored in a non-volatile memory mediumDisk etc.) in, use including some instructions so that a computer equipment (can be personal computer, takeBusiness device, or the network equipment etc.) perform method described in the application each implement scene.
It will be appreciated by those skilled in the art that accompanying drawing is a schematic diagram for being preferable to carry out scene, in accompanying drawingModule or flow process not necessarily implement necessary to the application.
It will be appreciated by those skilled in the art that the module in device in implement scene can be according to implement sceneDescription carries out being distributed in the device of implement scene, it is also possible to carries out respective change and is disposed other than this enforcementIn one or more devices of scene.The module of above-mentioned implement scene can merge into a module, also may be usedTo be further split into multiple submodule.
Above-mentioned the application sequence number is for illustration only, does not represent the quality of implement scene.
Disclosed above is only that the several of the application are embodied as scene, but, the application is not limited toThis, the changes that any person skilled in the art can think of should all fall into the protection domain of the application.