Movatterモバイル変換


[0]ホーム

URL:


CN104077128B - A kind of data processing method and device - Google Patents

A kind of data processing method and device
Download PDF

Info

Publication number
CN104077128B
CN104077128BCN201410251500.2ACN201410251500ACN104077128BCN 104077128 BCN104077128 BCN 104077128BCN 201410251500 ACN201410251500 ACN 201410251500ACN 104077128 BCN104077128 BCN 104077128B
Authority
CN
China
Prior art keywords
data
variable
macroprogram
represented
grand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410251500.2A
Other languages
Chinese (zh)
Other versions
CN104077128A (en
Inventor
杨秀祯
杨凌
薛颖慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank CorpfiledCriticalChina Construction Bank Corp
Priority to CN201410251500.2ApriorityCriticalpatent/CN104077128B/en
Publication of CN104077128ApublicationCriticalpatent/CN104077128A/en
Application grantedgrantedCritical
Publication of CN104077128BpublicationCriticalpatent/CN104077128B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

The embodiment of the invention discloses a kind of data processing method, including:Obtain the control instruction for carrying macro control statement of user input, the macro control statement includes the grand and prompt message for needing to call, and the prompt message includes the data storage path for being used to point out represented by pending variable and the operating parameter processed the data;The control instruction is responded, the data according to represented by the prompt message reads the pending variable from the file of the data storage path;Grand the grand corresponding macroprogram is read from specific file according to described;The execution parameter of the macroprogram is set for the operating parameter, and corresponding data processing is carried out to the data represented by the pending variable using the macroprogram.The embodiment of the invention also discloses a kind of data processing equipment.Using the embodiment of the present invention, the quick treatment to data is realized, data-handling efficiency is high, simple to operate.

Description

A kind of data processing method and device
Technical field
The present invention relates to field of computer technology, more particularly to a kind of data processing method and device.
Background technology
With the development of various applications, generally require to carry out data processing to the data represented by various variables, for example,Before needing the data represented by the spending amount variable to all user's January to be analyzed in bank, in order to analyze moreAccurately, more fit masses consumptive characteristics, then need to enter the data represented by the spending amount variable in all user's JanuaryThe corresponding data processing of row, such as, be by the spending amount in all user's January by the abnormality value removing in all dataThe 1% of maximum is weeded out in data represented by variable.Existing implementation method is fully relied on manually, and user first is from differentThe data preparation represented by the spending amount variable in all user's January is concentrated in a data in data set, calculates consumptionThe amount of money numerical value of the highest 1% of the amount of money, the record that the numerical value is will be greater than in data set is rejected.This data processing sideMethod fully relies on artificial operation, in the case where variable is more, it is necessary to devote a tremendous amount of time, inefficiency.
The content of the invention
The embodiment of the present invention provides a kind of data processing method and device, the quick treatment to data is capable of achieving, at dataReason efficiency high, it is simple to operate.
A kind of data processing method is the embodiment of the invention provides, including:
The control instruction for carrying macro control statement of user input is obtained, the macro control statement includes needing what is calledGrand and prompt message, the prompt message is included for pointing out data storage path represented by pending variable and to instituteState the operating parameter that data are processed;
Respond the control instruction, read from the file of the data storage path according to the prompt message described in treatData represented by treatment variable;
Grand the grand corresponding macroprogram is read from specific file according to described;
The execution parameter of the macroprogram is set for the operating parameter, and using the macroprogram to the pending changeThe represented data of amount carry out corresponding data processing.
Correspondingly, the embodiment of the present invention additionally provides a kind of data processing equipment, including:
Acquisition module, the control instruction for carrying macro control statement for obtaining user input, the macro control statementIncluding the grand and prompt message that needs are called, the prompt message is included for pointing out the data represented by pending variable to depositStorage path and the operating parameter that the data are processed;
First read module, for responding the control instruction, according to the prompt message from the data storage pathFile in read data represented by the pending variable;
Second read module, for grand reading the grand corresponding macroprogram from specific file according to described;
Data processing module, is the operating parameter for setting the execution parameter of the macroprogram, and using described grandProgram carries out corresponding data processing to the data represented by the pending variable.
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be to that will make needed for embodiment descriptionAccompanying drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the present invention, for abilityFor the those of ordinary skill of domain, on the premise of not paying creative work, can also obtain other attached according to these accompanying drawingsFigure.
Fig. 1 is a kind of schematic flow sheet of data processing method provided in an embodiment of the present invention;
Fig. 2 is the schematic flow sheet of another data processing method provided in an embodiment of the present invention;
Fig. 3 is the schematic flow sheet of another data processing method provided in an embodiment of the present invention;
Fig. 4 is the schematic flow sheet of another data processing method provided in an embodiment of the present invention;
Fig. 5 is the schematic flow sheet of another contact action method provided in an embodiment of the present invention;
Fig. 6 is the schematic flow sheet of another data processing method provided in an embodiment of the present invention;
Fig. 7 is the schematic flow sheet of another contact action method provided in an embodiment of the present invention;
Fig. 8 is a kind of structural representation of data processing equipment provided in an embodiment of the present invention;
Fig. 9 is a kind of structural representation of first read module provided in an embodiment of the present invention;
Figure 10 is the structural representation of another first read module provided in an embodiment of the present invention;
Figure 11 is a kind of structural representation of data processing module provided in an embodiment of the present invention;
Figure 12 is the structural representation of another data processing module provided in an embodiment of the present invention;
Figure 13 is the structural representation of another data processing module provided in an embodiment of the present invention;
Figure 14 is the structural representation of another data processing module provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, completeSite preparation is described, it is clear that described embodiment is a part of embodiment of the invention, rather than whole embodiments.Based on this hairEmbodiment in bright, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not madeExample, belongs to the scope of protection of the invention.
Data processing method described in the embodiment of the present invention can apply to statistical analysis (Statistics AnalysisSystem, SAS) on software platform, control instruction, and the macro control statement pair in control instruction are input on SAS softwaresData carry out corresponding data processing, and data processing can be variable pretreatment and prescreening in Modeling of Data Mining is solved,Specifically, including:The quality of data is checked, the extraction of batch original variable, variable missing values and outlier processing, automatically generate and spread outThe amount of changing and single argument selection etc..The operation method of the embodiment of the present invention does not need the support of other software, requires nothing more than peaceSAS softwares are filled, and has not differentiated between local terminal SAS softwares or server S AS softwares.
Fig. 1 is refer to, is a kind of schematic flow sheet of data processing method provided in an embodiment of the present invention;As described in Figure 1,A kind of data processing method described in the present embodiment includes step:
S100, obtains the control instruction for carrying macro control statement of user input, and the macro control statement includes needingThe grand and prompt message called, the prompt message include for point out data storage path represented by pending variable withAnd to operating parameter that the data are processed;
In specific embodiment, when user needs to carry out corresponding data processing to the data represented by pending variable,Need to be input into control instruction, the control instruction can include macro control statement, for example, the editor in SAS softwares is input into controlSystem instruction, the control instruction is macro control statement.Macro control statement includes grand and prompt message that needs call, it is necessary to adjustWith it is grand embody user needs data are carried out with which kind of data processing, prompt message be used for point out represented by pending variableData storage path and the operating parameter processed the data, grand in SAS softwares can carry out the quality of dataWhat is checked is grand, and the grand corresponding data processing is that the quality of data is checked.It should be noted that the existing way of prompt messageCan have various, for example, prompt message can be the form store path being associated with SAS softwares in SAS softwares, in formThe data storage path represented by pending variable is stored, when needing to obtain the data represented by pending variable, is then neededAssociated form is first obtained, then goes to obtain the store path of the data represented by pending variable from form;Prompting letterBreath can also be the data storage path represented by pending variable, can directly find pending variable institute by prompt messageThe data of expression, operating parameter can be data variable title, the Data Date for being extracted, it is necessary to the kind of the derivative variable for calculatingClass etc..
S101, responds the control instruction, is read from the file of the data storage path according to the prompt messageData represented by the pending variable;
In specific embodiment, be input into control instruction is responded, for example, when this data processing method is that to be used in SAS softOn part, be then input into macro control statement is responded, file of the prompt message in macro control statement from data storage pathThe middle data read represented by pending variable.Because the existing way of prompt message has various, obtained according to prompt messageTaking the mode of the data represented by pending variable also has various.
S102, grand the grand corresponding macroprogram is read according to described from specific file;
In specific embodiment, grand in macro control statement reads the grand corresponding macroprogram from specific file, needsIt is noted that macroprogram can be stored in one file, it is also possible to which storage, can be according to control instruction in multiple filesIn grand title read corresponding macroprogram.
S103, the execution parameter for setting the macroprogram is the operating parameter, and is treated to described using the macroprogramData represented by treatment variable carry out corresponding data processing.
In specific embodiment, corresponding macroprogram is called, and it is operating parameter to set the execution parameter of macroprogram, so that realCorresponding data processing now is carried out to the data represented by pending variable.Specifically, corresponding data processing can be hereThe quality of data is checked, the extraction of batch original variable, variable missing values and outlier processing, automatically generate derivative variable and listVariables choice etc..
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Fig. 2 is refer to, is the schematic flow sheet of another data processing method provided in an embodiment of the present invention;The present embodimentDescribed another data processing method includes step:
S200, obtains the control instruction for carrying macro control statement of user input, and the macro control statement includes needingThe grand and prompt message called, the prompt message include for point out data storage path represented by pending variable withAnd to operating parameter that the data are processed;
In specific embodiment, embodiment step S100 described in the present embodiment step S200 reference pictures 1 will not be repeated here.
S201, responds the control instruction, obtains the preset table according to the prompt message, and read described defaultData set store path where the data represented by described pending variable stored in form, and the data set is stored into roadFootpath is defined as the data storage path;
In specific embodiment, when prompt message is preset table store path information, the preset table storage is pendingData set store path where data represented by variable.
When in SAS softwares, preset table is associated with SAS softwares, by the pending change of user input in preset tableThe logical base of data set, is data set store path where amount, when runs software, can be got in the preset tableData set store path where pending variable.
S202, obtains the data set from the file of the data storage path, and institute is obtained from the data setState the data represented by pending variable.
In specific embodiment, data set is got from the file of data storage path, then from being obtained from data set and treatingData represented by reason variable.For example, pending variable is user's January spending amount, then extracted from the data set in JanuaryGo out all customer consumption value datas.
S203, grand the grand corresponding macroprogram is read according to described from specific file;
In specific embodiment, embodiment step S102 described in the present embodiment step S203 reference pictures 1 will not be repeated here.
S204, the execution parameter for setting the macroprogram is the operating parameter, and is treated to described using the macroprogramData represented by treatment variable carry out corresponding data processing.
In specific embodiment, embodiment step S103 described in the present embodiment step S204 reference pictures 1 will not be repeated here.
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Fig. 3 is refer to, is the schematic flow sheet of another data processing method provided in an embodiment of the present invention;The present embodimentAnother described data processing method includes step:
S300, obtains the control instruction for carrying macro control statement of user input, and the macro control statement includes needingThe grand and prompt message called, the prompt message include for point out data storage path represented by pending variable withAnd to operating parameter that the data are processed;
In specific embodiment, embodiment step S100 described in the present embodiment step S300 reference pictures 1 will not be repeated here.
S301, responds the control instruction, and the data set store path is defined as into the data storage path;
In specific embodiment, the data set store path where prompt message is the data represented by pending variable, thenThe data set store path is directly defined as data storage path.
S302, the data set is obtained according to the prompt message from the file of the data storage path, and from instituteState and data represented by the pending variable are obtained in data set.
In specific embodiment, data set is got according to data set store path in prompt message, and looked into from data setFind the data represented by pending variable.
S303, grand the grand corresponding macroprogram is read according to described from specific file;
In specific embodiment, embodiment step S102 described in the present embodiment step S303 reference pictures 1 will not be repeated here.
S304, the execution parameter for setting the macroprogram is the operating parameter, and is treated to described using the macroprogramData represented by treatment variable carry out corresponding data processing.
In specific embodiment, embodiment step S103 described in the present embodiment step S304 reference pictures 1 will not be repeated here.
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Fig. 4 is refer to, is the schematic flow sheet of another data processing method provided in an embodiment of the present invention;The present embodimentAnother described data processing method includes step:
S400, obtains the control instruction for carrying macro control statement of user input, and the macro control statement includes needingThe grand and prompt message called, the prompt message include for point out data storage path represented by pending variable withAnd to operating parameter that the data are processed;
In specific embodiment, embodiment step S100 described in the present embodiment step S400 reference pictures 1 will not be repeated here.
S401, responds the control instruction, is read from the file of the data storage path according to the prompt messageData represented by the pending variable;
In specific embodiment, embodiment step S101 described in the present embodiment step S401 reference pictures 1 will not be repeated here.
S402, grand the grand corresponding macroprogram is read according to described from specific file;
In specific embodiment, embodiment step S102 described in the present embodiment step S402 reference pictures 1 will not be repeated here.
S403, calls the macroprogram;
S404, the execution parameter for setting the macroprogram is the operating parameter;
In specific embodiment, operating parameter can be variable name, output format, statistical method etc..
S405, the macroprogram according to the operating parameter, according to default output format by the pending variable institute tableThe data shown carry out statistical disposition, and export the data after statistical disposition.
In specific embodiment, the macroprogram is used to check the quality of data, so that user checks the distribution situation of the variable,Whether data are had, how is distribution, if meets business rule, so as to decide whether to extract original change of the variable as modelingAmount.
Specifically, statistical disposition can be record number, average, missing values, the minimum for counting the data represented by the variableValue, each quantile, interval, interval record number, interval percentage, accumulative perception and maximum etc., specific statisticsProcessing mode can be that user presets, for example, in SAS softwares, when macroprogram %varchek is called, for numerical valueThe variable of type, if not specifying output format, counts record number, average, missing values, the minimum of the data represented by the variableValue, each quantile, maximum.For character type variable and the numeric type variable of specified output format, then the variable is countedInterval records number, interval percentage, accumulative perception etc..
Further, the data after output statistical disposition, the specific way of output can converge the link of all variablesAlways in the form of one page html forms, as long as clicking on corresponding variable can check that it is distributed.Its distribution is for thisThe data after data statistics processing represented by variable.
After carrying out quality to the data represented by pending variable and checking, the number represented by pending variable can be extractedAccording to this can also can be obtained according to new variable naming to data by calling macroprogram %varget, and after runningThe data being named.
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Fig. 5 is refer to, is the schematic flow sheet of another data processing method provided in an embodiment of the present invention;The present embodimentAnother described data processing method includes step:
S500, obtains the control instruction for carrying macro control statement of user input, and the macro control statement includes needingThe grand and prompt message called, the prompt message include for point out data storage path represented by pending variable withAnd to operating parameter that the data are processed;
In specific embodiment, embodiment step S100 described in the present embodiment step S500 reference pictures 1 will not be repeated here.
S501, responds the control instruction, is read from the file of the data storage path according to the prompt messageData represented by the pending variable;
In specific embodiment, embodiment step S101 described in the present embodiment step S501 reference pictures 1 will not be repeated here.
S502, grand the grand corresponding macroprogram is read according to described from specific file;
In specific embodiment, embodiment step S102 described in the present embodiment step S502 reference pictures 1 will not be repeated here.
S503, calls the macroprogram;
S504, the execution parameter for setting the macroprogram is the operating parameter;
In specific embodiment, operating parameter can be default value for being substituted for missing values etc..
S504, the macroprogram counts the missing of the data represented by the pending variable according to the operating parameterRate, and export the miss rate;
In specific embodiment, before being modeled using the data represented by pending variable, it usually needs treat placeData represented by reason variable carry out the treatment of missing values and exceptional value.Macroprogram counts the number represented by pending variable firstAccording to miss rate, and miss rate is exported, so that user understands the deletion condition of the data represented by the pending variable.
Missing values in data represented by the pending variable are substituted for default value by S505, the macroprogram.
In specific embodiment, the missing values in the data represented by pending variable are substituted for default value by macroprogram,For example, missing values to be all substituted for numerical value 0.It should be noted that predetermined threshold value can be user being carried out according to actual conditionsSetting.
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Fig. 6 is refer to, is the schematic flow sheet of another data processing method provided in an embodiment of the present invention;The present embodimentAnother described data processing method includes step:
S600, obtains the control instruction for carrying macro control statement of user input, and the macro control statement includes needingThe grand and prompt message called, the prompt message include for point out data storage path represented by pending variable withAnd to operating parameter that the data are processed;
In specific embodiment, embodiment step S100 described in the present embodiment step S600 reference pictures 1 will not be repeated here.
S601, responds the control instruction, is read from the file of the data storage path according to the prompt messageData represented by the pending variable;
In specific embodiment, embodiment step S101 described in the present embodiment step S601 reference pictures 1 will not be repeated here.
S602, grand the grand corresponding macroprogram is read according to described from specific file;
In specific embodiment, embodiment step S102 described in the present embodiment step S602 reference pictures 1 will not be repeated here.
S603, calls the macroprogram;
S604, the execution parameter for setting the macroprogram is the operating parameter;
In specific embodiment, operating parameter can be calculative original variable name, need to calculate spreading out for which typeThe amount of changing etc..
S605, the macroprogram is based on the operating parameter, and the data according to represented by the pending variable, calculatesThe default derivative variable of the pending variable, and export the default derivative variable.
In specific embodiment, in order to increase the predictive ability of pending variable, generally require to carry out respectively pending variableThe conversion of the form of kind, for example, generating the derivative variable of pending variable, derivative variable can be the number represented by pending variableMaximum, minimum value, average and trend variable in etc..Typically according to actual conditions, user can be set needs meterThe default derivative variable calculated, macroprogram calculates the default derivative variable of pending variable by statistical calculations method, and defeatedGo out the default derivative variable for being calculated.So that represented by user from the default derivative variable analysis for the being exported pending variableData distribution situation.
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Fig. 7 is refer to, is the schematic flow sheet of another data processing method provided in an embodiment of the present invention;The present embodimentAnother described data processing method includes step:
S700, obtains the control instruction for carrying macro control statement of user input, and the macro control statement includes needingThe grand and prompt message called, the prompt message include for point out data storage path represented by pending variable withAnd to operating parameter that the data are processed;;
In specific embodiment, embodiment step S100 described in the present embodiment step S700 reference pictures 1 will not be repeated here.
S701, responds the control instruction, is read from the file of the data storage path according to the prompt messageData represented by the pending variable;
In specific embodiment, embodiment step S101 described in the present embodiment step S701 reference pictures 1 will not be repeated here.
S702, grand the grand corresponding macroprogram is read according to described from specific file;
In specific embodiment, embodiment step S102 described in the present embodiment step S702 reference pictures 1 will not be repeated here.
S703, calls the macroprogram;
S704, the execution parameter for setting the macroprogram is the operating parameter;
In specific embodiment, operating parameter can be required point when classifying to variable of classification number, finally needVariable number of reservation etc..
S704, data of the macroprogram according to represented by each described variable, calculates the value of information of the variable;
In specific embodiment, pending variable includes at least one variable, in order to realize to included in pending variableVariable carry out prescreening, filter out the variable with predictive ability, then call first macroprogram %var_chose calculate two-valueThe value of information of the pending variable of type or the pending variable of continuous type.Each variable in pending variable has an informationValue.It should be noted that the value of information of variable shows the variable to the predictive ability of target variable just.
S705, the macroprogram is classified at least one variable, obtains at least one classification;
In specific embodiment, in order to exclude influence of the synteny of variable in pending variable to modelling effect, then needCall macroprogram %varclus to classify variable, the variable with synteny feature is classified as a class, for example, in userIn credit card record, customer consumption number of times and spending amount are divided into a class, all variables in pending variable are dividedClass, is so obtained with multiple classifications.
S706, the macroprogram obtains value of information highest variable in each classification, and by the value of information in each classification mostVariable high is defined as the variable after screening.
In specific embodiment, most there is the variable of predictive ability in order to filter out, then need to call macroprogram by each classificationMiddle value of information highest variable is picked out, and value of information highest variable in each classification is defined as into the variable after screening,Variable after output screening, so that user carries out further data modeling point in the variable after screening being substituted into modelAnalysis.It should be noted that being the dimension for reducing pending variable to the purpose that pending variable is screened.
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
A kind of implementing for data processing equipment provided in an embodiment of the present invention is described below.
Fig. 8 is refer to, is a kind of structural representation of data processing equipment provided in an embodiment of the present invention.As shown in figure 8,A kind of data processing equipment described in the present embodiment includes:Acquisition module 100, the first read module 101, the second read module andData processing module 102.
Acquisition module 100, the control instruction for carrying macro control statement for obtaining user input, the grand control languageSentence includes the grand and prompt message for needing to call, and the prompt message is included for pointing out the data represented by pending variableStore path and the operating parameter that the data are processed;
In specific embodiment, when user needs to carry out corresponding data processing to the data represented by pending variable,Need to be input into control instruction, the control instruction can include macro control statement, and acquisition module 100 obtains carrying for user inputThe control instruction of macro control statement.For example, the editor in SAS softwares is input into control instruction, the control instruction is grand controlSentence.Macro control statement includes the grand and prompt message that needs call, it is necessary to that calls grand embody user and need logarithmAccording to which kind of data processing is carried out, prompt message is used to point out the data storage path represented by pending variable, and to describedThe operating parameter that data are processed, grand in SAS softwares can carry out grand, grand corresponding number that the quality of data is checkedIt is that the quality of data is checked according to treatment.It should be noted that the existing way of prompt message can have various, for example, soft in SASPrompt message can be the form store path being associated with SAS softwares in part, be stored in form represented by pending variableData storage path, when needing to obtain the data represented by pending variable, then need first to obtain associated form, thenRemove to obtain the store path of the data represented by pending variable from form;Prompt message can also be pending variable institute tableThe data storage path for showing, can directly find data represented by pending variable by prompt message, and operating parameter can be withIt is data variable title, the Data Date for being extracted, it is necessary to species of derivative variable for calculating etc..
First read module 101, for responding the control instruction, according to the prompt message from the data storage roadThe data represented by the pending variable are read in the file in footpath;
In specific embodiment, the first read module 101 responds be input into control instruction, for example, working as this data processing sideMethod is used on SAS softwares, then respond be input into macro control statement, and the first read module 101 is according in macro control statementPrompt message data represented by pending variable are read from the file of data storage path.Because the presence of prompt messageMode has various, so the mode of the data according to represented by prompt message obtains pending variable also has various.
Second read module 102, for grand reading the grand corresponding macroprogram from specific file according to described;
In specific embodiment, it is grand that grand in macro control statement of the second read module 102 reads this from specific fileCorresponding macroprogram is, it is necessary to illustrate, macroprogram can be stored in one file, it is also possible to store in multiple files,Corresponding macroprogram can be read according to title grand in control instruction.
Data processing module 103, is the operating parameter for setting the execution parameter of the macroprogram, and using describedMacroprogram carries out corresponding data processing to the data represented by the pending variable.
In specific embodiment, data processing module 103 calls corresponding macroprogram, and sets the execution parameter of macroprogram and beOperating parameter, so as to realize carrying out corresponding data processing to the data represented by pending variable.Specifically, corresponding hereData processing can be that the quality of data is checked, the extraction of batch original variable, variable missing values and outlier processing, automatically generatedDerivative variable and single argument selection etc..
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Fig. 9 is refer to, is a kind of structural representation of first read module provided in an embodiment of the present invention.As shown in figure 9,A kind of first read module described in the present embodiment includes:First acquisition unit 1010 and second acquisition unit 1011.
First acquisition unit 1010, for obtaining the preset table according to the prompt message, and reads described defaultData set store path where the data represented by described pending variable stored in form, and the data set is stored into roadFootpath is defined as the data storage path;
In specific embodiment, when prompt message is preset table store path information, the preset table storage is pendingData set store path where data represented by variable.
When in SAS softwares, preset table is associated with SAS softwares, by the pending change of user input in preset tableThe logical base of data set, is data set store path where amount, and when runs software, first acquisition unit 1010 can be at thisData set store path where pending variable is got in preset table.
Second acquisition unit 1011, for obtaining the data set from the file of the data storage path, and from instituteState and data represented by the pending variable are obtained in data set.
In specific embodiment, second acquisition unit 1011 gets data set from the file of data storage path, then fromThe data represented by pending variable are obtained in data set.For example, pending variable is user's January spending amount, then from JanuaryAll customer consumption value datas are extracted in the data set of part.
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Figure 10 is refer to, is the structural representation of another first read module provided in an embodiment of the present invention.Such as Figure 10Shown, a kind of first read module described in the present embodiment includes:The acquiring unit 1013 of determining unit 1012 and the 3rd.
Determining unit 1012, for the data set store path to be defined as into the data storage path;
In specific embodiment, the data set store path where prompt message is the data represented by pending variable, reallyThe data set store path is then directly defined as data storage path by order unit 1012.
3rd acquiring unit 1013, for obtaining institute from the file of the data storage path according to the prompt messageData set is stated, and the data represented by the pending variable are obtained from the data set.
In specific embodiment, the 3rd acquiring unit 1013 gets data according to data set store path in prompt messageCollection, and the data represented by pending variable are found from data set.
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Figure 11 is refer to, is a kind of structural representation of data processing module provided in an embodiment of the present invention.Such as Figure 11 institutesShow, a kind of data processing module described in the present embodiment includes:First call unit 1030, the first setting unit 1031 and statisticsProcessing unit 1032.
First call unit 1030, for calling the macroprogram;
First setting unit 1031, the execution parameter for setting the macroprogram is the operating parameter;
In specific embodiment, operating parameter can be variable name, output format, statistical method etc..
Statistical disposition unit 1032, for using the macroprogram according to the operating parameter, according to default output formatData represented by the pending variable are carried out into statistical disposition, and exports the data after statistical disposition.
In specific embodiment, the macroprogram is used to check the quality of data, so that user checks the distribution situation of the variable,Whether data are had, how is distribution, if meets business rule, so as to decide whether to extract original change of the variable as modelingAmount.
Specifically, statistical disposition can be statistical disposition unit 1031 count the variable represented by data record number,Average, missing values, minimum value, each quantile, interval, interval record number, interval percentage, accumulative perception and maximumEtc., specific statistical disposition mode can be that user presets, for example, in SAS softwares, calling macroprogram %During varchek, for the variable of numeric type, if not specifying output format, the record of the data represented by the variable is countedNumber, average, missing values, minimum value, each quantile, maximum.For the numeric type of character type variable and specified output formatVariable, then count interval record number, interval percentage, accumulative perception of the variable etc..
Further, the data after output statistical disposition, the specific way of output can converge the link of all variablesAlways in the form of one page html forms, as long as clicking on corresponding variable can check that it is distributed.Its distribution is for thisThe data after data statistics processing represented by variable.
After carrying out quality to the data represented by pending variable and checking, the number represented by pending variable can be extractedAccording to this can also can be obtained according to new variable naming to data by calling macroprogram %varget, and after runningThe data being named.
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Figure 12 is refer to, is the structural representation of another data processing module provided in an embodiment of the present invention.Such as Figure 12Shown, a kind of data processing module described in the present embodiment includes:Second call unit 1033, the second setting unit 1034, systemMeter output unit 1035 and replacement unit 1036.
Second call unit 1033, for calling the macroprogram;
Second setting unit 1034, the execution parameter for setting the macroprogram is the operating parameter;
In specific embodiment, operating parameter can be default value for being substituted for missing values etc..
Statistics output unit 1035, according to the operating parameter, the pending change is counted for using the macroprogramThe miss rate of the represented data of amount, and export the miss rate;
In specific embodiment, before being modeled using the data represented by pending variable, it usually needs treat placeData represented by reason variable carry out the treatment of missing values and exceptional value.Statistics output unit 1033 is counted using macroprogram firstThe miss rate of the data represented by pending variable, and miss rate is exported, so that user is understood represented by the pending variableThe deletion condition of data.
Replacement unit 1036, is replaced the missing values in the data represented by the pending variable using the macroprogramInto default value.
In specific embodiment, replacement unit 1034 uses macroprogram by the missing values in the data represented by pending variableDefault value is substituted for, for example, missing values to be all substituted for numerical value 0.It should be noted that predetermined threshold value can be user's rootSet according to actual conditions
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Figure 13 is refer to, is the structural representation of another data processing module provided in an embodiment of the present invention.Such as Figure 13Shown, a kind of data processing module described in the present embodiment includes:3rd call unit 1037, the 3rd setting unit 1038 and meterCalculate output unit 1039.
3rd call unit 1037, for calling the macroprogram;
3rd setting unit 1038, the execution parameter for setting the macroprogram is the operating parameter;
In specific embodiment, operating parameter can be calculative original variable name, need to calculate spreading out for which typeThe amount of changing etc..
Output unit 1039 is calculated, for being based on the operating parameter using the macroprogram, and according to described pendingData represented by variable, calculate the default derivative variable of the pending variable, and export the default derivative variable.
In specific embodiment, in order to increase the predictive ability of pending variable, generally require to carry out respectively pending variableThe conversion of the form of kind, for example, generating the derivative variable of pending variable, derivative variable can be the number represented by pending variableMaximum, minimum value, average and trend variable in etc..Typically according to actual conditions, user can be set needs meterThe default derivative variable calculated, calculates output unit 1036 using macroprogram by statistical calculations method, calculates pending changeThe default derivative variable of amount, and export the default derivative variable for being calculated.So that user divides from the default derivative variable for being exportedAnalyse the data distribution situation represented by the pending variable.
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
Figure 14 is refer to, is the structural representation of another data processing module provided in an embodiment of the present invention.Such as Figure 14Shown, a kind of data processing module described in the present embodiment includes:4th call unit 1040, the 4th setting unit 1041, meterCalculate unit 1042, taxon 1043 and obtain determining unit 1044.
4th call unit 1040, for calling the macroprogram;
4th setting unit 1041, the execution parameter for setting the macroprogram is the operating parameter;
In specific embodiment, operating parameter can be required point when classifying to variable of classification number, finally needVariable number of reservation etc..
Computing unit 1042, using data of the macroprogram according to represented by each described variable, calculates the variableThe value of information;
In specific embodiment, pending variable includes at least one variable, in order to realize to included in pending variableVariable carry out prescreening, filter out the variable with predictive ability, then computing unit 1038 calls macroprogram %var_ firstChose calculates the value of information of the pending variable of two-value type or the pending variable of continuous type.Each change in pending variableAmount has a value of information.It should be noted that the value of information of variable shows the variable to the predictive ability of target variable just.
Taxon 1043, is classified at least one variable using the macroprogram, obtains at least one classNot;
In specific embodiment, in order to exclude influence of the synteny of variable in pending variable to modelling effect, then needTaxon 1039 calls macroprogram %varclus to classify variable, and the variable with synteny feature is classified as into a class,For example, in user credit card record, customer consumption number of times and spending amount are divided into a class, will be all in pending variableVariable is classified, and is so obtained with multiple classifications.
Determining unit 1044 is obtained, using value of information highest variable in the macroprogram each classification of acquisition, and will be everyValue of information highest variable is defined as the variable after screening in one classification.
In specific embodiment, most there is the variable of predictive ability in order to filter out, then obtaining determining unit 1040 needs to callMacroprogram picks out value of information highest variable in each classification, and value of information highest variable in each classification is determinedIt is the variable after screening, the variable after output screening, so that user enters traveling one in the variable after screening being substituted into modelThe Modeling analysis of step.It should be noted that being the dimension for reducing pending variable to the purpose that pending variable is screenedDegree.
In the embodiment of the present invention, the control instruction for carrying macro control statement of user input is obtained, in macro control statementIncluding needing call grand and for pointing out data storage path represented by pending variable and the data being carried outThe operating parameter for the treatment of, the data according to represented by prompt message reads pending variable, the execution parameter for setting macroprogram isOperating parameter, and the data represented by acquired pending variable are carried out at corresponding data using grand corresponding macroprogramReason.This data processing method, is capable of achieving automatically to process data using grand corresponding macroprogram, data-handling efficiencyHeight, it is simple to operate.
One of ordinary skill in the art will appreciate that all or part of flow in realizing above-described embodiment method, can beThe hardware of correlation is instructed to complete by computer program, described program can be stored in a computer read/write memory mediumIn, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, described storage medium can be magneticDish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random AccessMemory, RAM) etc..
Step in present invention method can according to actual needs carry out order adjustment, merge and delete.
Module or unit in embodiment of the present invention terminal can according to actual needs be merged, divide and deleted.
Module described in the embodiment of the present invention or unit, can be by universal integrated circuit, such as CPU (CentralProcessing Unit, central processing unit), or by ASIC (Application Specific IntegratedCircuit, application specific integrated circuit) realize.
Above disclosed is only present pre-ferred embodiments, can not limit the right model of the present invention with this certainlyEnclose, therefore the equivalent variations made according to the claims in the present invention, still belong to the scope that the present invention is covered.

Claims (10)

CN201410251500.2A2014-06-092014-06-09A kind of data processing method and deviceActiveCN104077128B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201410251500.2ACN104077128B (en)2014-06-092014-06-09A kind of data processing method and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201410251500.2ACN104077128B (en)2014-06-092014-06-09A kind of data processing method and device

Publications (2)

Publication NumberPublication Date
CN104077128A CN104077128A (en)2014-10-01
CN104077128Btrue CN104077128B (en)2017-07-11

Family

ID=51598400

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201410251500.2AActiveCN104077128B (en)2014-06-092014-06-09A kind of data processing method and device

Country Status (1)

CountryLink
CN (1)CN104077128B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107038289A (en)*2017-03-232017-08-11西安飞机工业(集团)有限责任公司The processing method of initial data in a kind of Aircraft Load design
CN106960043A (en)*2017-03-302017-07-18中国航空工业集团公司西安飞机设计研究所The processing method of initial data in a kind of Aircraft Load design
CN108197224B (en)*2017-12-282020-11-20广州虎牙信息科技有限公司User group classification method, storage medium and terminal
CN109376162A (en)*2018-09-032019-02-22平安普惠企业管理有限公司 Form data processing method, terminal device and computer-readable storage medium
CN111857698B (en)*2020-07-022024-04-12苏州谷夫道自动化科技有限公司Macro program configuration method and device
CN114138772A (en)*2020-09-042022-03-04京东科技控股股份有限公司Derivative variable generation method and device, terminal equipment and storage medium
CN115934763A (en)*2021-09-222023-04-07华润微电子(重庆)有限公司 Data processing method, system and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6487713B1 (en)*1999-09-242002-11-26Phoenix Technologies Ltd.Software development system that presents a logical view of project components, facilitates their selection, and signals missing links prior to compilation
CN1399737A (en)*1999-09-242003-02-26凤凰技术有限公司Software development system for facilitating selection of components
CN101211175A (en)*2006-12-292008-07-02中国科学院沈阳计算技术研究所有限公司Numerical control system graph-aided macro programming design method
CN101344947A (en)*2007-07-102009-01-14蒋建河Deposit and borrow separated online financing method and system
CN101776881A (en)*2009-12-312010-07-14北京数码大方科技有限公司Method for generating rounding macro-program code
CN102549559A (en)*2009-08-132012-07-04谷歌公司Virtual object indirection in a hosted computer environment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8972930B2 (en)*2010-06-042015-03-03Microsoft CorporationGenerating text manipulation programs using input-output examples

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6487713B1 (en)*1999-09-242002-11-26Phoenix Technologies Ltd.Software development system that presents a logical view of project components, facilitates their selection, and signals missing links prior to compilation
CN1399737A (en)*1999-09-242003-02-26凤凰技术有限公司Software development system for facilitating selection of components
CN101211175A (en)*2006-12-292008-07-02中国科学院沈阳计算技术研究所有限公司Numerical control system graph-aided macro programming design method
CN101344947A (en)*2007-07-102009-01-14蒋建河Deposit and borrow separated online financing method and system
CN102549559A (en)*2009-08-132012-07-04谷歌公司Virtual object indirection in a hosted computer environment
CN101776881A (en)*2009-12-312010-07-14北京数码大方科技有限公司Method for generating rounding macro-program code

Also Published As

Publication numberPublication date
CN104077128A (en)2014-10-01

Similar Documents

PublicationPublication DateTitle
CN104077128B (en)A kind of data processing method and device
CN108509485B (en)Data preprocessing method and device, computer equipment and storage medium
CN109597974B (en)Report generation method and device
CN108038052A (en)Automatic test management method, device, terminal device and storage medium
CN107797894B (en)APP user behavior analysis method and device
CN108090829A (en)A kind of data managing method, data administrator and electronic equipment
CN119168075B (en) A method and system for real-time processing and analysis of AI big data
CN113946590A (en)Method, device and equipment for updating integral data and storage medium
CN119829469B (en)Firmware testing method, electronic device, storage medium and program product
CN117911085B (en)User management system, method and terminal based on enterprise marketing
CN115935220A (en) Behavior analysis method, device, electronic device and computer program product
CN115098499A (en) Method and device for optimizing database index
CN106874286A (en)A kind of method and device for screening user characteristics
CN106447385A (en)Data processing method and apparatus
WangResearch on bank marketing behavior based on machine learning
CN113890010A (en) A method, device, recording medium and system for differential lightning protection evaluation of power grid
CN106933552B (en)Data processing method and front-end code generating device
US11227288B1 (en)Systems and methods for integration of disparate data feeds for unified data monitoring
CN113706223B (en)Data processing method and device
JP7674908B2 (en) Recommendation system and product recommendation method
CN116738100A (en)User interaction path optimization method, device, equipment and medium
CN115794835A (en)Method, system, equipment and medium for generating government affair form based on knowledge graph
CN116796133A (en)Data analysis method, device, computer equipment and storage medium
CN115713231A (en)Wind control method, system, medium and electronic device based on user portrait
CN110245775B (en)User collection and payment data analysis method and device and computer equipment

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp