The content of the invention
The embodiments of the invention provide a kind of data managing method and data management platform, it is possible to increase data management is imitatedRate.
In a first aspect, the embodiments of the invention provide a kind of data managing method, created in the asset library pre-setAt least one data model, in addition to:
Data source is determined, and target data mould corresponding with the data source is determined at least one data modelType;
Source data is gathered from the data source;
The source data is handled according to the target data model, determines target data and metadata;
By metadata storage into the management storehouse pre-set;
By target data storage into the target data model.
Preferably,
It is described to create at least one data model in the asset library pre-set, including:
Receive at least one model information that user submits;
For model information each described, it is performed both by:Judge whether "current" model information meets the model pre-setAccess rules, if it is, creating the data model corresponding with the "current" model information in the asset library pre-set.
Preferably,
It is described by the metadata storage into the management storehouse pre-set before, further comprise:
Judge whether the metadata meets the data access rule pre-set, if it is, performing described by the memberIn the management storehouse that data Cun Chudao is pre-set.
Preferably,
The source data is handled according to the target data model described, determine target data and metadata itAfterwards, further comprise:
According to the regular and described target data of the label pre-set, the label information of the target data is determined;
By label information storage into the management storehouse.
Preferably,
Further comprise:
Count the number of targets stored in the data volume of the metadata stored in the management storehouse, the asset libraryAccording to data volume;
The obtained data volume of the metadata will be counted and the data volume of the target data shows user.
Preferably,
The model information, including:The combination of any one or more in model name, model code and model description.
Second aspect, the embodiments of the invention provide a kind of data management platform, including:
Model AM access module, for creating at least one data model in the asset library of setting;Metadata storage is arrivedIn the management storehouse of setting;
Data access module, for determining data source, and at least one number created in the model AM access moduleAccording to determination target data model corresponding with the data source in model;Source data is gathered from the data source;According to describedTarget data model is handled the source data, determines target data and the metadata;The target data is storedInto the target data model.
Preferably,
The model AM access module, for receiving at least one model information of user's submission;For mould each describedType information, is performed both by:Judge whether "current" model information meets the model access rules set, if it is, in the assets of settingThe data model corresponding with the "current" model information is created in storehouse.
Preferably,
The data access module, it is further used for judging whether the metadata meets the data access rule set,The metadata is stored into the management storehouse of setting if it is, execution is described.
Preferably,
The data access module, it is further used for the regular and described target data of label according to setting, it is determined that describedThe label information of target data;
The model AM access module, it is further used for label information storage into the management storehouse.
Preferably,
Further comprise:Display module;
The display module, for counting data volume, the asset library of the metadata stored in the management storehouseThe data volume of the target data of middle storage;The obtained data volume of the metadata and the number of the target data will be countedUser is showed according to amount.
Preferably,
The model information, including:The combination of any one or more in model name, model code and model description.
The embodiments of the invention provide a kind of data managing method and data management platform, wherein, the data managing methodData source is handled by modes such as data conversion, data cleansings, obtains target data and metadata, target data is ledEnter into the target data model of asset library, metadata is imported in management storehouse.Need not manually it be handled during being somebody's turn to do, fromDynamicization degree is high, it is possible to increase data management efficiency.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present inventionIn accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment isPart of the embodiment of the present invention, rather than whole embodiments, based on the embodiment in the present invention, those of ordinary skill in the artThe every other embodiment obtained on the premise of creative work is not made, belongs to the scope of protection of the invention.
As shown in figure 1, the embodiments of the invention provide a kind of data managing method, this method may comprise steps of:
Step 101:At least one data model is created in the asset library pre-set;
Step 102:Data source is determined, and target data mould corresponding with data source is determined at least one data modelType;
Step 103:Source data is gathered from data source;
In embodiments of the present invention, the source data of multiple data sources can be gathered simultaneously, and each source data is entered respectivelyRow processing, but the processing method of each source data is the same.
Step 104:Source data is handled according to target data model, determines target data and metadata;
Step 105:By metadata storage into the management storehouse pre-set;
Step 106:By target data storage into target data model.
The data managing method is handled data source by data conversion, data cleansing etc., obtain target data andMetadata, target data is imported into the target data model of asset library, metadata is imported in management storehouse.During being somebody's turn to do notNeed manually to be handled, automaticity is high, it is possible to increase data management efficiency.
In practical application scene, data warehouse includes three sub- data warehouses, cleaning storehouse, asset library and management storehouse.
Cleaning storehouse prime responsibility is the outside frequently data access of reply, right as the cushion of data processing and cleaningData processing personnel open, and wherein data and model will be cleared up after not operated in certain time.Asset library is mainly blamedIt is storage mass data assets to appoint.It is data storage asset metadata to manage storehouse prime responsibility.
In embodiments of the present invention, source data of the library storage from different data sources is cleaned.
In the data managing method, it is related to two review processes, data examination & verification and model examination & verification, two will be examined belowNuclear process is further described.
In one embodiment of the invention, for Controlling model access procedure, it is necessary to verify the legitimacy of model information,Therefore, at least one data model is created in the asset library pre-set, including:
Receive at least one model information that user submits;
For each model information, it is performed both by:Judge whether "current" model information meets the model access pre-setRule, if it is, creating the data model corresponding with "current" model information in the asset library pre-set.
Model information, including:The combination of any one or more in model name, model code and model description.
Using model access rules, model information is screened, it is determined that meeting the model information of model access rules.It is logicalMultiple data models can be created in asset library by crossing this method, and each data model takes on a different character, to storeIn the corresponding target data of model information.
In one embodiment of the invention, in order to verify the legitimacy of target data to be accessed, deposited by metadataBefore storing up in the management storehouse pre-set, further comprise:
Judge whether metadata meets the data access rule pre-set, if it is, performing metadata storage in advanceIn the management storehouse first set.
Pass through audit function, it is ensured that metadata and the correctness of target data transmission.
In order to facilitate user search to the target data needed, it is necessary to add label to data model and target data.
Wherein, the label of data model refers to the own feature tag of data model, describes the feature of data model,Manual maintenance in model creation or importing process.
The label of target data refers to the feature tag generated after working process, and carrying out business to target data energizesAs a result.Label divide into polytype by tag control, and different types realizes different label management methods, wherein, markLabel management method refers to the method that label realizes page presentation logic, and the screening logical transition of label is into being available for the modes such as ES to inquire aboutFilter method, the Query Result of label is converted into by correlation tag dimension table by suitable result formats method.
In one embodiment of the invention, source data is being handled according to target data model, is determining number of targetsAccording to after metadata, further comprise:
According to the label rule and target data pre-set, the label information of target data is determined, label information is depositedStore up in management storehouse
For example, by label rule by target data according to being divided into A, B two parts month, and be A according to label rulePart addition label August, label September is added for part B., can be with by keyword August when being retrieved to target dataPart A target data is retrieved, by keyword September, part B target data can be retrieved.
The realization of data managing method is to be based on ETL (Extract-Transform-Load, extraction-conversion-loading) skillArt.
Data warehouse is towards analysis, and operational database is application oriented.Therefore, in the method, mainlyIt is the number that the target data model determination needs in data warehouse extract from application database.
In specific development process, developer is necessarily frequently found some ETL steps and target data model description is not inconsistent.It at this time will again check, design requirement, and re-start ETL.As database series this in talk about, it is any be related toTo the variation of demand, it is required for accent to start and document of upgrading demand.
The structure for the data that transfer process is primarily referred to as having got well extraction is changed, to meet the mistake of target data modelJourney.In addition, transfer process also is responsible for quality of data work, this part is also referred to as data cleansing.
Loading procedure is that the target data that the quality of data is ensure that after conversion is loaded into target data model.Loading canIt is divided into two kinds:Load first and refresh loading.Wherein, loading first can be related to mass data, and refresh loading and then belong to oneThe loading of the micro- batch type of kind.
In actual applications, tag control can also be expanded, to realize newly-increased label, modification label, delete and markThe function of label, here is omitted.
In one embodiment of the invention, in order to which the data volume stored into user's display data storehouse, this method are also wrappedInclude:
The data volume of the target data stored in the data volume of the metadata stored in statistical management storehouse, asset library;
The obtained data volume of metadata will be counted and the data volume of target data shows user.
In addition to this it is possible to show asset library to user, manage the capacity in storehouse, quantity of data model etc. in asset libraryThe information of other dimensions.
As shown in Fig. 2 the embodiment of the present invention is by taking the source data access data warehouse by the collection of data source as an example, logarithmIt is described in detail according to management method, this method comprises the following steps:
Step 201:Receive at least one model information that user submits.
Model information, including:Model name, model code and model description.
Wherein, model description includes the characteristic information of the data model.
Step 202:For each model information, it is performed both by:Judge whether "current" model information meets what is pre-setModel access rules, if it is, performing step 203.
For example, the type that data model is set in model access rules is relational model, the data included in model descriptionThe type of model is also relational model, then creates data model according to the model information, still, when the number included in model descriptionWhen according to the type of model being hierarchical model, establishment process is not performed.
Step 203:The data model corresponding with "current" model information is created in the asset library pre-set.
Step 204:Data source is determined, and target data mould corresponding with data source is determined at least one data modelType.
At least one access task that user is set is determined, wherein, access task includes data source to be collected and shouldThe data source target data model to be accessed, this method determine execution sequence according to the time of the access task received, enter oneStep ground, in actual applications, can also set the execution cycle of each access task, i.e., each a period of time just performs access and appointedBusiness.
Step 205:Source data is gathered from data source.
In embodiments of the present invention, the source data of multiple data sources can be gathered simultaneously, and each source data is entered respectivelyRow processing, but the processing method of each source data is the same.
Step 206:Source data is handled according to target data model, determines target data and metadata.
A series of processing such as converted, cleaned to source data according to the call format of target data model, being finally givenTarget data and the metadata for describing target data.
Step 207:According to the label rule and target data pre-set, the label information of target data is determined, will be markedInformation storage is signed into management storehouse.
Target data is made a distinction according to label rule, different label letters is added for different types of target dataBreath.
Step 208:Judge whether metadata meets the data access rule pre-set, if it is, performing step 209.
Step 209:By metadata storage into the management storehouse pre-set.
Step 210:By target data storage into target data model.
In embodiments of the present invention, target data and metadata are stored respectively in management storehouse and the asset library of data warehouseIn.Target data and separated from meta-data, be advantageous to data management, user can be retrieved according to metadata, with asset libraryIt is middle to search the target data needed.
Step 211:The data of the target data stored in the data volume of the metadata stored in statistical management storehouse, asset libraryAmount.
In order that user understands data warehouse storage data cases, in order to adjust storage location in time, in the present embodimentIn, it is necessary to the data volume of the data volume in statistical management storehouse and asset library respectively.
Step 212:The obtained data volume of metadata will be counted and the data volume of target data shows user.
Statistics can be shown to user by modes such as word or charts.
As shown in figure 3, the embodiments of the invention provide a kind of data management platform, including:
Model AM access module 301, for creating at least one data model in the asset library of setting;Metadata is storedInto the management storehouse of setting;
Data access module 302, for determining data source, and at least one data created in model AM access module 301Target data model corresponding with data source is determined in model;Source data is gathered from data source;According to target data model pairSource data is handled, and determines target data and metadata;By target data storage into target data model.
In one embodiment of the invention, model AM access module 301, for receiving at least one model of user's submissionInformation;For each model information, it is performed both by:Judge whether "current" model information meets the model access rules set, such asFruit is that the data model corresponding with "current" model information is created in the asset library of setting.
In one embodiment of the invention, data access module, it is further used for judging whether metadata meets to setData access rule, if it is, perform by metadata storage into the management storehouse pre-set.
In one embodiment of the invention, data access module 301, be further used for according to the label of setting rule andTarget data, determine the label information of target data;
Model AM access module 301, it is additionally operable to label information storage into management storehouse.
In one embodiment of the invention, as shown in figure 4, data management platform also includes:Display module 303;
Display module 303, the number of targets stored in the data volume, asset library for the metadata that is stored in statistical management storehouseAccording to data volume;The obtained data volume of metadata will be counted and the data volume of target data shows user.
In one embodiment of the invention, model information, including:Appoint in model name, model code and model descriptionThe combination for one or more of anticipating.
The contents such as the information exchange between each unit, implementation procedure in said apparatus, due to implementing with the inventive methodExample is based on same design, and particular content can be found in the narration in the inventive method embodiment, and here is omitted.
The embodiments of the invention provide a kind of computer-readable recording medium, including execute instruction, when the computing device of storage controlDuring execute instruction, method that storage control performs above-described embodiment.
The embodiments of the invention provide a kind of storage control, including:Processor, memory and bus;
Memory is used to store execute instruction, and processor is connected with memory by bus, when storage control is run,The execute instruction of computing device memory storage, so that the method that storage control performs above-described embodiment.
To sum up, each embodiment of the present invention at least has the effect that:
1st, in embodiments of the present invention, the data managing method is carried out by data conversion, data cleansing etc. to data sourceProcessing, obtains target data and metadata, target data is imported into the target data model of asset library, metadata is importedManage in storehouse.Need not manually it be handled during being somebody's turn to do, automaticity is high, it is possible to increase data management efficiency.
2nd, in embodiments of the present invention, model access procedure is audited by model information, passes through metadata logarithmAudited according to access procedure, ensure that the security of data warehouse.
3rd, in embodiments of the present invention, it is that data model adds label information by model information, is by label ruleTarget data adds label information, and user can be retrieved by label information to data model and target data, and then be obtainedTake the data of needs.
It should be noted that herein, such as first and second etc relational terms are used merely to an entityOr operation makes a distinction with another entity or operation, and not necessarily require or imply and exist between these entities or operationAny this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant be intended to it is non-It is exclusive to include, so that process, method, article or equipment including a series of elements not only include those key elements,But also the other element including being not expressly set out, or also include solid by this process, method, article or equipmentSome key elements.In the absence of more restrictions, the key element limited by sentence " including one ", is not arrangedExcept other identical factor in the process including the key element, method, article or equipment being also present.
One of ordinary skill in the art will appreciate that:Realizing all or part of step of above method embodiment can pass throughProgrammed instruction related hardware is completed, and foregoing program can be stored in computer-readable storage medium, the programUpon execution, the step of execution includes above method embodiment;And foregoing storage medium includes:ROM, RAM, magnetic disc or lightDisk etc. is various can be with the medium of store program codes.
It is last it should be noted that:Presently preferred embodiments of the present invention is the foregoing is only, is merely to illustrate the skill of the present inventionArt scheme, is not intended to limit the scope of the present invention.Any modification for being made within the spirit and principles of the invention,Equivalent substitution, improvement etc., are all contained in protection scope of the present invention.