Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The method embodiments provided in the embodiments of the present application may be executed in a computer terminal or a similar computing device. Taking the example of being operated on a computer terminal, fig. 1 is a hardware structure block diagram of a computer terminal of a data processing method according to an embodiment of the present invention. As shown in fig. 1, the computer terminal may include one or more (only one shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and in an exemplary embodiment, may also include atransmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the computer terminal. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration with equivalent functionality to that shown in FIG. 1 or with more functionality than that shown in FIG. 1.
The memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the data processing method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to a computer terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Thetransmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal. In one example, thetransmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, thetransmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
At present, in an existing application modeling mode, tables which are specific to application-independent flow structures, data hierarchies, calculation logics and a set of fixed structures are usually calculated and scheduled separately for different applications, and each field is an independent original data type in a data table creating process. The table building mode is intuitive and convenient in the subsequent use process, the meaning of each field and the meaning represented by the data value can be clearly known, but certain limitation also exists:
1) with the continuous increase of the application, the created tables are increased, the maintenance cost of the data table is increased, the increase of the field naming also causes the conditions of increased risk of data confusion, excessive memory capacity occupation and the like; 2) the number of fields of the table structure is relatively fixed and is not easy to model, the subsequent field expansion needs to continuously change the table structure, and meanwhile, the consistency with historical data cannot be guaranteed after dimensionality is added, so that the original data table meaning is changed; 3) in addition, for some applications with the same data source, similar calculation logic, similar data table structure and the same execution cycle, data redundancy is caused by establishing the table alone, excessive storage space is occupied, resource waste is caused, and development time cost and labor cost are increased.
In order to solve the above problem, in the present embodiment, a data processing method is provided, and fig. 2 is a flowchart of the data processing method according to the embodiment of the present invention, where the flowchart includes the following steps:
step S302, a plurality of log data reported by a plurality of devices are obtained and stored in a source data layer;
step S204, obtaining a target application requirement from a pre-configured configuration table, wherein the configuration table stores a plurality of application requirements, and the application requirements are used for indicating index values of target devices obtained from the plurality of devices;
step S306, extracting data corresponding to the target application requirement from the plurality of log data according to the target application requirement, calculating the data corresponding to the target application requirement to determine a data result requested by the target application request, and storing the data result in an open data layer, so that a data demander obtains the data result from the open data layer, wherein an intermediate data result obtained in the process of calculating the data corresponding to the target application requirement is stored in an intermediate data layer.
Through the steps, a data processing method is introduced, the reported log data are obtained from a plurality of application devices, a plurality of log data are obtained and stored in a source data layer, then a target application requirement is obtained from a preset configuration table, further data corresponding to the target application requirement are extracted from the plurality of log data according to the target application requirement, the data corresponding to the target application requirement are calculated, so that a data result requested by the target application request is determined, the data result is stored in an open data layer, so that a data demander can obtain the data result from the open data layer, and an intermediate data result obtained in the process of calculating the data corresponding to the target application requirement is stored in an intermediate data layer. By adopting the technical scheme, the problems that different devices have independent data hierarchies and the like, so that the data is difficult to integrate, the storage hierarchy is disordered and the like are solved, and the data of the different devices are integrated by the data processing method.
It can be understood that, in this embodiment, first, the reported log data is obtained from a plurality of application devices, a plurality of log data is obtained and stored in a source data layer, then, a target application requirement is obtained from a pre-configured configuration table, further, data corresponding to the target application requirement is extracted from the plurality of log data according to the target application requirement, and the data corresponding to the target application requirement is calculated to determine a data result requested by the target application request, and the data result is stored in an open data layer, so that a data demander obtains the data result from the open data layer, where an intermediate data result obtained in a process of calculating the data corresponding to the target application requirement is stored in an intermediate data layer. For example: the existing net device washing machine, net device air conditioner and net device water heater respectively store the generated log data in a source data layer, and when the water consumption and total water consumption duration of the washing machine in one day are required to be obtained, a configuration table X is configured in advance according to user requirements, so that the log data generated by the washing machine and the configuration table X are subjected to data filtering, and filtered data A is obtained and stored in a middle data layer, and then the data A is subjected to corresponding logic calculation according to the user requirements to obtain a data result, and the obtained data result is stored in an open data layer, so that a data demand party can obtain the data result from the open data layer.
In the process of performing the calculation on the data corresponding to the target application requirement to determine the data result requested by the target application request in step S206, in an optional embodiment, the method includes: acquiring computing logic of the indexes of the plurality of devices, and generating a computing template corresponding to the indexes for the indexes with the same computing logic according to the computing logic; calculating the partial data according to the calculation template to obtain a first calculation result; determining a target algorithm for other data except the partial data in the data, and calculating the other data according to the target algorithm to obtain a second calculation result; and determining the data result according to the first calculation result and the second calculation result.
In this embodiment, different devices have their own algorithms and logics, but there are often some common parts in different devices, and indexes having the same computational logic are written into a template by using a classification principle through a fixed data source, a Structured Query Language (SQL) and an algorithm. The difference part is combined with an Azkaban task file or other scheduling systems to realize that different equipment indexes are calculated in a template mode through parameter configuration. Meanwhile, the editing of the model is an accumulated process, and the templates are gradually increased along with the increase of the devices. If the device contains partially uncategorized logic, a targeted separate computation is required. For example: now, the water consumption of the washing machine in a period of time is obtained, log data generated by the washing machine and a configuration table X are subjected to data filtering, and filtering data A related to the water consumption of the washing machine in a period of time is obtained, wherein a plurality of subdata in the filtering data A comprise: subdata a, subdata b, subdata c, subdata d, subdata e and the like, and if the washing machine obtains the water consumption, the calculation steps are as follows: the method comprises the following steps: calculating the sub data a and the sub data b by using an algorithm 1 to obtain a first calculation result; step two: calculating the subdata c, the subdata d and the subdata e by using an algorithm 2 to obtain a second calculation result; step three: and determining a target data result according to the first calculation result and the second calculation result. The method comprises the steps that indexes of the same calculation logic between the washing machine and the water heater can be classified to generate a calculation template M corresponding to the indexes as the water consumption of the water heater within a period of time needs to be acquired sometimes, the subdata a and the subdata b are calculated through the calculation template M (fixed calculation logic) to obtain a first calculation result, the subdata d and the subdata e are calculated through a single calculation logic algorithm 2 to obtain a second calculation result, and the water consumption of the washing machine within a period of time is determined according to the first calculation result and the second calculation result.
Some relevant networker models are listed below: (1) the fixed source single attribute index model is implemented in the form of fixed computation logic, a configuration table and a unified intermediate table as shown in table 1; (2) the fixed-source multi-attribute index model is implemented in the form of fixed computation logic, a configuration table and a unified intermediate table as shown in table 2; (3) the non-fixed data source model, as shown in table 3, is implemented as a single computation logic + configuration table + unified middle table.
TABLE 1
| Description of the invention | Examples of the invention |
| Fixed data source | Tablename |
| Properties | Water flow (currentWaterFlux) |
| Index value | 50L |
| Index code | 0101050204 |
TABLE 2
TABLE 3
| Description of the invention | Examples of the invention |
| Non-fixed data source | Having other data tables |
| Reporting result data after data calculation | Containing attribute detail data or summary data |
| Index code | 0101050204 |
In step S204, before obtaining the target application requirement from the pre-configured configuration table, optionally, the method further includes: configuring the configuration table by: performing a configuration operation on the configuration table by a target object to determine the configuration table, wherein the configuration operation includes at least one of: defining operation, adding operation and deleting operation. In the present embodiment, the data configuration plays a role in data filtering and supplementing in the whole application modeling, and the configuration data is usually stored in a data table. In data layering, the vertical direction can occur in any layer, such as a source data layer or an intermediate data layer, and the horizontal direction generally occurs in a permanent time dimension through the associated use with the source data or the intermediate data. One application may have one or more configuration tables, and the data in the data configuration table is designed, defined, and subsequently added and deleted by the data development engineer according to the application requirements, and the following lists configuration data tables and data examples, where table 4 is an attribute configuration table and table 5 is an index configuration table.
TABLE 4
TABLE 5
| Name of field | Type of field | Field comments | Examples of the invention | Remarks for note |
| appCode | string | Application coding | eth_month_report | |
| appName | string | Application name | Monthly report of electric water heater | |
| indexCode | string | Index coding | 0101050204 | Coding of indicators |
| indexUnit | string | Index unit | ℃ | |
The step S306 mentioned above extracts the data corresponding to the target application requirement from the plurality of log data according to the target application requirement, and in an optional embodiment, the step includes: determining a data type of each log data of the plurality of log data, wherein the data type includes at least one of: the fixed data source is used for indicating log data to be reported by the multiple devices, and the non-fixed data source is used for indicating historical log data stored in the multiple devices; and under the condition that the data type is a fixed data source, executing filtering operation of query selection in the plurality of log data according to index data corresponding to the target application request so as to extract data corresponding to the target application request from the plurality of log data according to the target application request, wherein if the data type is a non-fixed data source, the filtering operation is not required, and the data type is directly used for the application.
In this embodiment, the fixed data source refers to detail data stored in Hive on the ground of the report data, the name and structure of the table are fixed, other calculations are not included, the data source is clear, and the data source can be used as source data of a certain type of application and is fixed as a data source of subsequent calculations. The non-fixed data source refers to reporting result data of data after other logic calculation as an application data source, the data source table is uncertain, the structure of the data table is unfixed, the calculation logic before cleaning and the data accuracy determination are needed during use, and compared with the fixed data source, a data confirmation process is needed before use.
After the step S306 stores the data result in the open data layer, optionally, the method further includes: receiving an inquiry request sent by a target object through the open data layer, wherein the inquiry request is used for acquiring a target data result in the data results; and responding to the query request, and feeding the target data result back to the target object under the condition that the target object is determined to have the query authority. In this embodiment, because the data is opened due to the problem related to the data authority, the opened data is managed and controlled through the index configuration table. And the data summary table and the index configuration table are subjected to join association through the index codes, open control is realized, and the result data is written into the data table of the Hive open data layer after association. The index code is bound with the application id in the index configuration table, the application id is transmitted into the SQL executive program in a parameter mode through the Azkaban task file, and data of the open layer can be synchronized into a relational database such as mysql and the like through datax or other related components and is provided for other people to use.
For better understanding, in an alternative embodiment, the source data layer corresponds to one or more first data tables, the intermediate data layer corresponds to one or more second data tables, and the open data layer corresponds to one or more third data tables, wherein the table structures of the first data tables, the second data tables, and the third data tables are the same.
In this embodiment, in order to obtain a required data result in a data development process, a plurality of processes are often performed, so a data layering idea is adopted, fig. 3 is a data layering structure diagram of the data processing method according to the embodiment of the present invention, and data layering is a logic concept in an application data development model. The layering in this model employs a network structure with longitudinal and transverse cross bonds. The division of the longitudinal hierarchy is mainly based on the calculation state of a certain type of application data, and the division of the transverse hierarchy is mainly based on the time dimension of data calculation. The data are divided into three or more layers in the longitudinal direction, and the three or more layers are respectively as follows: a source data layer, a middle data layer and an open data layer. The source data layer is used for taking data which is not subjected to calculation as an application data source and serving as a basis for supporting subsequent calculation; the intermediate data layer is calculated but not calculated process data; the open data layer is a result obtained after all calculations are finished, and can provide data used by subsequent personnel, and the number of data layers can be increased or decreased according to business needs. The transverse directions are respectively as follows: a permanent time dimension, a day time dimension, a month time dimension, and a week time dimension, a year time dimension, etc. may also be added in the horizontal direction. The vertical data flow is from bottom to top, and the horizontal data flow is from left to right, so that a crossed directed acyclic flow structure is formed.
The table structures of the first data table, the second data table and the third data table are the same, and it can be understood that the unified data table structure is an important link in applying the data warehouse modeling. Each layer of data comprises a plurality of Hive data tables, and after the division and layering are completed, the number of the data tables, the table names, the number of table fields, the field names and the field types in each horizontal and vertical layering are required to be determined. Combining two data types of array < string > or map < string, string > provided by a Hive database system through a table building mode of combining an original data type and a complex data type, and combining part of dimensions or index fields into an array or key in a coding mode: the value pair is established into a complex table structure, so that the data dimension and index expansion is realized, and the meaning of the historical data is not influenced. Only one or one type of data table needs to be created, the operation of creating the data tables respectively for different applications is released, the disorder condition caused by excessive table naming and field naming is also avoided, different applications are more uniform, and management is convenient, for example, the structure of a database table can be as shown in the following table 6:
TABLE 6
It is to be understood that the above-described embodiments are only a few, but not all, embodiments of the present invention. In order to better understand the data processing method, the following describes the above process with reference to an embodiment, but the process is not limited to the technical solution of the embodiment of the present invention, and specifically:
fig. 4 is a flow chart of a data processing method for modeling a bin according to an embodiment of the present invention, and as shown in fig. 4, in an alternative embodiment, first, according to a service requirement, a data source table and a persistent time dimension configuration table of a good source data layer are determined, if a data source is a fixed data source, join filtering operation of Hive SQL is performed through some attribute data of devices in the source data and configuration table, data required by the devices is obtained, and result data is written into a Hive data table having the same structure as the source data table in an intermediate layer, and is used as data for subsequent calculation. In the SQL statement of the filtering process, the partition field of the source data and the equipment code in the configuration table are embodied in the form of parameters, and the specific partition field value and the equipment code are written into the task file of the Azkaban, so that the reusability of the SQL statement is ensured and the SQL statement can be used as a template for repeated use of subsequent equipment; if the data source is a non-fixed data source, the filtering of the configuration table is not needed, and the data source is directly used for the equipment.
After determining the data source and the filtering operation, next, it is necessary to determine the time dimension of the middle layer table and perform the division of the original data type and the complex data type on the table field, for example, Id, wifiType, dimCode, String type of the indexCode field is the original data type, map < String, String > type of the dimValue field is the complex data type, and then a physical data table is created in Hive through a database operating language SQL.
And calculating the number of required attributes according to the indexes according to the service requirements, and classifying the calculation logic. If the data source is a fixed data source, the data source can be divided into a fixed source single-attribute index, a fixed source multi-attribute index and the like; if the data source is uncertain or involves excessive attributes and involves special handling, it is necessary to have separate computation logic. After the computation logic is classified, the fixed data source writes index codes or other condition character strings into task files in a parameter form by combining SQL with the task files of Azkaban through the middle layer data after the filtering operation, for example, '01050501', 01050502 'and 01050503' form a computation logic template, and other devices can reuse the template by the same computation logic. The calculated data does not distinguish data sources, and the calculation results are written into a data table of the corresponding time dimension of the middle layer, such as hour, day and month. If the required result data requires that the time granularity is small, subsequent longitudinal logic calculation can be directly carried out; if the result data is day data, the day data can be transversely summarized to a day data table through the data of the hour middle layer; if the data is monthly data, the data can be summarized into the monthly table through the daily data table, so that intermediate process data can be reserved for facilitating subsequent problem investigation and the like; if the data can be directly output to the daily degree intermediate table without hour, the monthly data suggestion is still obtained by summarizing from the daily degree data table, so that the data of the whole month is decomposed to be summarized after daily calculation, and the problems of insufficient calculation resources, overlarge server system pressure, overlong calculation time and the like caused by overlarge data quantity can be avoided.
The calculated data and the same id contain one or more indexes, and each index is expressed by { index code: index value, which exists in the form of a key value pair, and needs to be split into data recorded one line per index, the index code string field may be split by using SQL statement, horizontal view extension (split ([ index _ cd ], ',')) s2 as [ index _ cd ], and then the index value may be split by using index _ value [ s2.index _ cd ], and the resulting data may be written into the index summary data table of hive. The process does not distinguish data sources, and is a reusable process, and because the intermediate table structure is fixed, the target table structure is fixed, and the SQL logic of index splitting is fixed, other subsequent devices can continue to use.
The data obtained by splitting and summarizing the indexes is the result required by the user, and then the calculated result is provided, and the process is opened for the data. And managing and controlling the opened data through the index configuration table due to the problem related to the data authority in the data opening. And the data summary table and the index configuration table are subjected to join association through the index codes, open control is realized, and the result data is written into the data table of the Hive open data layer after association. The index code is bound with the device id in the index configuration table, the device id is transmitted into the SQL executive program in a parameter mode through the Azkaban task file, the open process logic is also fixed, and other devices can be repeatedly used. The data of the open layer can be synchronized into a relational database such as mySQL and the like through datax or other related components, and can be used by others.
In addition, the technical scheme of the embodiment of the invention uses a data source classification processing and data logic horizontal and vertical net-shaped cross layered structure, simultaneously, a data intermediate layer table structure adopts a table building mode of combining an original data type and a complex data type, integrates a unified equipment intermediate table, avoids the field uniqueness of different equipment in an index coding mode, and adopts calculation logic classification and templating and different equipment and indexes to be configured in a table configuration mode, thereby solving the problems of maintenance difficulty and cost increase caused by increasing the number of data tables by equipment, avoiding the element number increase generated by table building and excessive occupation of cluster memory, simultaneously improving the expansibility of data dimension and indexes, reducing the modeling difficulty of equipment, shortening the development period, improving the development efficiency, reducing the development cost and increasing the reusability of data and calculation, and the modular process is used, so that the development mode is more flexible.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In this embodiment, a data processing apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated.
Fig. 5 is a block diagram of a data processing apparatus according to an embodiment of the present invention, the apparatus including:
a first obtainingmodule 52, configured to obtain multiple log data reported by multiple devices, and store the multiple log data in a source data layer;
a second obtainingmodule 54, configured to obtain a target device requirement from a pre-configured configuration table, where the configuration table stores a plurality of device requirements, and the device requirement is used to indicate to obtain an index value of a target device from the plurality of devices;
theprocessing module 56 is configured to extract data corresponding to the target device requirement from the plurality of log data according to the target device requirement, calculate the data corresponding to the target device requirement to determine a data result requested by the target device requirement, and store the data result in an open data layer, so that a data demander obtains the data result from the open data layer, where an intermediate data result obtained in a process of calculating the data corresponding to the target device requirement is stored in an intermediate data layer.
According to the invention, a data processing device is introduced, the reported log data are obtained from a plurality of application devices to obtain a plurality of log data, the log data are stored in a source data layer, then the target application requirements are obtained from a preset configuration table, the data corresponding to the target application requirements are extracted from the log data according to the target application requirements, the data corresponding to the target application requirements are calculated to determine the data result requested by the target application request, and the data result is stored in an open data layer so that a data demander can obtain the data result from the open data layer, wherein the intermediate data result obtained in the process of calculating the data corresponding to the target application requirements is stored in an intermediate data layer. By adopting the technical scheme, the problems that different devices have independent data hierarchies and the like, so that the data is difficult to integrate, the storage hierarchy is disordered and the like are solved, and the data of the different devices are integrated by the data processing method.
It can be understood that, in this embodiment, first, the reported log data is obtained from a plurality of application devices, a plurality of log data is obtained and stored in a source data layer, then, a target application requirement is obtained from a pre-configured configuration table, further, data corresponding to the target application requirement is extracted from the plurality of log data according to the target application requirement, and the data corresponding to the target application requirement is calculated to determine a data result requested by the target application request, and the data result is stored in an open data layer, so that a data demander obtains the data result from the open data layer, where an intermediate data result obtained in a process of calculating the data corresponding to the target application requirement is stored in an intermediate data layer. For example: the existing net device washing machine, net device air conditioner and net device water heater respectively store the generated log data in a source data layer, and when the water consumption and total water consumption duration of the washing machine in one day are required to be obtained, a configuration table X is configured in advance according to user requirements, so that the log data generated by the washing machine and the configuration table X are subjected to data filtering, and filtered data A is obtained and stored in a middle data layer, and then the data A is subjected to corresponding logic calculation according to the user requirements to obtain a data result, and the obtained data result is stored in an open data layer, so that a data demand party can obtain the data result from the open data layer.
In an alternative embodiment, theprocessing module 56 is configured to: acquiring computing logic of the indexes of the plurality of devices, and generating a computing template corresponding to the indexes for the indexes with the same computing logic according to the computing logic; calculating the partial data according to the calculation template to obtain a first calculation result; determining a target algorithm for other data except the partial data in the data, and calculating the other data according to the target algorithm to obtain a second calculation result; and determining the data result according to the first calculation result and the second calculation result.
In this embodiment, different devices have their own algorithms and logics, but there are often some common parts in different devices, and indexes having the same computational logic are written into a template by using a classification principle through a fixed data source, a Structured Query Language (SQL) and an algorithm. The difference part is combined with an Azkaban task file or other scheduling systems to realize that different equipment indexes are calculated in a template mode through parameter configuration. Meanwhile, the editing of the model is an accumulated process, and the templates are gradually increased along with the increase of the devices. If the device contains partially uncategorized logic, a targeted separate computation is required. For example: now, the water consumption of the washing machine in a period of time is obtained, log data generated by the washing machine and a configuration table X are subjected to data filtering, and filtering data A related to the water consumption of the washing machine in a period of time is obtained, wherein a plurality of subdata in the filtering data A comprise: subdata a, subdata b, subdata c, subdata d, subdata e and the like, and if the washing machine obtains the water consumption, the calculation steps are as follows: the method comprises the following steps: calculating the sub data a and the sub data b by using an algorithm 1 to obtain a first calculation result; step two: calculating the subdata c, the subdata d and the subdata e by using an algorithm 2 to obtain a second calculation result; step three: and determining a target data result according to the first calculation result and the second calculation result. The method comprises the steps that indexes of the same calculation logic between the washing machine and the water heater can be classified to generate a calculation template M corresponding to the indexes as the water consumption of the water heater within a period of time needs to be acquired sometimes, the subdata a and the subdata b are calculated through the calculation template M (fixed calculation logic) to obtain a first calculation result, the subdata d and the subdata e are calculated through a single calculation logic algorithm 2 to obtain a second calculation result, and the water consumption of the washing machine within a period of time is determined according to the first calculation result and the second calculation result.
Some relevant networker models are listed below: (1) the fixed source single attribute index model is implemented in the form of fixed computation logic, a configuration table and a unified intermediate table as shown in table 1; (2) the fixed-source multi-attribute index model is implemented in the form of fixed computation logic, a configuration table and a unified intermediate table as shown in table 2; (3) the non-fixed data source model, as shown in table 3, is implemented as a single computation logic + configuration table + unified middle table.
A second obtainingmodule 54, configured to configure the configuration table by: performing a configuration operation on the configuration table by a target object to determine the configuration table, wherein the configuration operation includes at least one of: defining operation, adding operation and deleting operation. In the present embodiment, the data configuration plays a role in data filtering and supplementing in the whole application modeling, and the configuration data is usually stored in a data table. In data layering, the vertical direction can occur in any layer, such as a source data layer or an intermediate data layer, and the horizontal direction generally occurs in a permanent time dimension through the associated use with the source data or the intermediate data. One application may have one or more configuration tables, and the data in the data configuration table is designed, defined, and subsequently added and deleted by the data development engineer according to the application requirements, and the following lists configuration data tables and data examples, where table 4 is an attribute configuration table and table 5 is an index configuration table.
In an alternative embodiment, theprocessing module 56 is further configured to: determining a data type of each log data of the plurality of log data, wherein the data type includes at least one of: the fixed data source is used for indicating log data to be reported by the multiple devices, and the non-fixed data source is used for indicating historical log data stored in the multiple devices; and under the condition that the data type is a fixed data source, executing filtering operation of query selection in the plurality of log data according to index data corresponding to the target application request so as to extract data corresponding to the target application request from the plurality of log data according to the target application request, wherein if the data type is a non-fixed data source, the filtering operation is not required, and the data type is directly used for the application.
In this embodiment, the fixed data source refers to detail data stored in Hive on the ground of the report data, the name and structure of the table are fixed, other calculations are not included, the data source is clear, and the data source can be used as source data of a certain type of application and is fixed as a data source of subsequent calculations. The non-fixed data source refers to reporting result data of data after other logic calculation as an application data source, the data source table is uncertain, the structure of the data table is unfixed, the calculation logic before cleaning and the data accuracy determination are needed during use, and compared with the fixed data source, a data confirmation process is needed before use.
Optionally, theprocessing module 56 is further configured to: receiving an inquiry request sent by a target object through the open data layer, wherein the inquiry request is used for acquiring a target data result in the data results; and responding to the query request, and feeding the target data result back to the target object under the condition that the target object is determined to have the query authority. In this embodiment, because the data is opened due to the problem related to the data authority, the opened data is managed and controlled through the index configuration table. And the data summary table and the index configuration table are subjected to join association through the index codes, open control is realized, and the result data is written into the data table of the Hive open data layer after association. The index code is bound with the application id in the index configuration table, the application id is transmitted into the SQL executive program in a parameter mode through the Azkaban task file, and data of the open layer can be synchronized into a relational database such as mysql and the like through datax or other related components and is provided for other people to use.
For better understanding, in an alternative embodiment, the source data layer corresponds to one or more first data tables, the intermediate data layer corresponds to one or more second data tables, and the open data layer corresponds to one or more third data tables, wherein the table structures of the first data tables, the second data tables, and the third data tables are the same.
In this embodiment, in order to obtain a required data result in a data development process, a plurality of processes are often performed, so a data layering idea is adopted, fig. 3 is a data layering structure diagram of the data processing method according to the embodiment of the present invention, and data layering is a logic concept in an application data development model. The layering in this model employs a network structure with longitudinal and transverse cross bonds. The division of the longitudinal hierarchy is mainly based on the calculation state of a certain type of application data, and the division of the transverse hierarchy is mainly based on the time dimension of data calculation. The data are divided into three or more layers in the longitudinal direction, and the three or more layers are respectively as follows: a source data layer, a middle data layer and an open data layer. The source data layer is used for taking data which is not subjected to calculation as an application data source and serving as a basis for supporting subsequent calculation; the intermediate data layer is calculated but not calculated process data; the open data layer is a result obtained after all calculations are finished, and can provide data used by subsequent personnel, and the number of data layers can be increased or decreased according to business needs. The transverse directions are respectively as follows: a permanent time dimension, a day time dimension, a month time dimension, and a week time dimension, a year time dimension, etc. may also be added in the horizontal direction. The vertical data flow is from bottom to top, and the horizontal data flow is from left to right, so that a crossed directed acyclic flow structure is formed.
The table structures of the first data table, the second data table and the third data table are the same, and it can be understood that the unified data table structure is an important link in applying the data warehouse modeling. Each layer of data comprises a plurality of Hive data tables, and after the division and layering are completed, the number of the data tables, the table names, the number of table fields, the field names and the field types in each horizontal and vertical layering are required to be determined. Combining two data types of array < string > or map < string, string > provided by a Hive database system through a table building mode of combining an original data type and a complex data type, and combining part of dimensions or index fields into an array or key in a coding mode: the value pair is established into a complex table structure, so that the data dimension and index expansion is realized, and the meaning of the historical data is not influenced. Only one or one type of data table needs to be created, the operation of creating the data tables respectively for different applications is released, the disorder condition caused by excessive table naming and field naming is also avoided, different applications are more uniform, and management is convenient, for example, the structure of a database table can be as shown in the following table 6:
embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, acquiring a plurality of log data reported by a plurality of devices, and storing the plurality of log data in a source data layer;
s2, obtaining a target device requirement from a pre-configured configuration table, wherein the configuration table stores a plurality of device requirements, and the device requirements are used for indicating index values of the target device obtained from the plurality of devices;
and S3, extracting data corresponding to the target device requirement from the plurality of log data according to the target device requirement, calculating the data corresponding to the target device requirement to determine a data result requested by the target device requirement, and storing the data result in an open data layer for a data demander to obtain the data result from the open data layer, wherein an intermediate data result obtained in the process of calculating the data corresponding to the target device requirement is stored in an intermediate data layer.
In an exemplary embodiment, the computer-readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary embodiments, and details of this embodiment are not repeated herein.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring a plurality of log data reported by a plurality of devices, and storing the plurality of log data in a source data layer;
s2, obtaining a target device requirement from a pre-configured configuration table, wherein the configuration table stores a plurality of device requirements, and the device requirements are used for indicating index values of the target device obtained from the plurality of devices;
and S3, extracting data corresponding to the target device requirement from the plurality of log data according to the target device requirement, calculating the data corresponding to the target device requirement to determine a data result requested by the target device requirement, and storing the data result in an open data layer for a data demander to obtain the data result from the open data layer, wherein an intermediate data result obtained in the process of calculating the data corresponding to the target device requirement is stored in an intermediate data layer.
In an exemplary embodiment, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary embodiments, and details of this embodiment are not repeated herein.
It will be apparent to those skilled in the art that the various modules or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and they may be implemented using program code executable by the computing devices, such that they may be stored in a memory device and executed by the computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.