Description of the invention	Examples of the invention
		Fixed data source	Tablename
Properties	Water flow (currentWaterFlux)
		Index value	50L
Index code	0101050204

TABLE 2

TABLE 3

Description of the invention	Examples of the invention
		Non-fixed data source	Having other data tables
Reporting result data after data calculation	Containing attribute detail data or summary data
		Index code	0101050204

In step S204, before obtaining the target application requirement from the pre-configured configuration table, optionally, the method further includes: configuring the configuration table by: performing a configuration operation on the configuration table by a target object to determine the configuration table, wherein the configuration operation includes at least one of: defining operation, adding operation and deleting operation. In the present embodiment, the data configuration plays a role in data filtering and supplementing in the whole application modeling, and the configuration data is usually stored in a data table. In data layering, the vertical direction can occur in any layer, such as a source data layer or an intermediate data layer, and the horizontal direction generally occurs in a permanent time dimension through the associated use with the source data or the intermediate data. One application may have one or more configuration tables, and the data in the data configuration table is designed, defined, and subsequently added and deleted by the data development engineer according to the application requirements, and the following lists configuration data tables and data examples, where table 4 is an attribute configuration table and table 5 is an index configuration table.

TABLE 4

TABLE 5

Name of field	Type of field	Field comments	Examples of the invention	Remarks for note
					appCode	string	Application coding	eth_month_report
appName	string	Application name	Monthly report of electric water heater
					indexCode	string	Index coding	0101050204	Coding of indicators
indexUnit	string	Index unit	℃

The step S306 mentioned above extracts the data corresponding to the target application requirement from the plurality of log data according to the target application requirement, and in an optional embodiment, the step includes: determining a data type of each log data of the plurality of log data, wherein the data type includes at least one of: the fixed data source is used for indicating log data to be reported by the multiple devices, and the non-fixed data source is used for indicating historical log data stored in the multiple devices; and under the condition that the data type is a fixed data source, executing filtering operation of query selection in the plurality of log data according to index data corresponding to the target application request so as to extract data corresponding to the target application request from the plurality of log data according to the target application request, wherein if the data type is a non-fixed data source, the filtering operation is not required, and the data type is directly used for the application.

In this embodiment, the fixed data source refers to detail data stored in Hive on the ground of the report data, the name and structure of the table are fixed, other calculations are not included, the data source is clear, and the data source can be used as source data of a certain type of application and is fixed as a data source of subsequent calculations. The non-fixed data source refers to reporting result data of data after other logic calculation as an application data source, the data source table is uncertain, the structure of the data table is unfixed, the calculation logic before cleaning and the data accuracy determination are needed during use, and compared with the fixed data source, a data confirmation process is needed before use.

After the step S306 stores the data result in the open data layer, optionally, the method further includes: receiving an inquiry request sent by a target object through the open data layer, wherein the inquiry request is used for acquiring a target data result in the data results; and responding to the query request, and feeding the target data result back to the target object under the condition that the target object is determined to have the query authority. In this embodiment, because the data is opened due to the problem related to the data authority, the opened data is managed and controlled through the index configuration table. And the data summary table and the index configuration table are subjected to join association through the index codes, open control is realized, and the result data is written into the data table of the Hive open data layer after association. The index code is bound with the application id in the index configuration table, the application id is transmitted into the SQL executive program in a parameter mode through the Azkaban task file, and data of the open layer can be synchronized into a relational database such as mysql and the like through datax or other related components and is provided for other people to use.

For better understanding, in an alternative embodiment, the source data layer corresponds to one or more first data tables, the intermediate data layer corresponds to one or more second data tables, and the open data layer corresponds to one or more third data tables, wherein the table structures of the first data tables, the second data tables, and the third data tables are the same.

In this embodiment, in order to obtain a required data result in a data development process, a plurality of processes are often performed, so a data layering idea is adopted, fig. 3 is a data layering structure diagram of the data processing method according to the embodiment of the present invention, and data layering is a logic concept in an application data development model. The layering in this model employs a network structure with longitudinal and transverse cross bonds. The division of the longitudinal hierarchy is mainly based on the calculation state of a certain type of application data, and the division of the transverse hierarchy is mainly based on the time dimension of data calculation. The data are divided into three or more layers in the longitudinal direction, and the three or more layers are respectively as follows: a source data layer, a middle data layer and an open data layer. The source data layer is used for taking data which is not subjected to calculation as an application data source and serving as a basis for supporting subsequent calculation; the intermediate data layer is calculated but not calculated process data; the open data layer is a result obtained after all calculations are finished, and can provide data used by subsequent personnel, and the number of data layers can be increased or decreased according to business needs. The transverse directions are respectively as follows: a permanent time dimension, a day time dimension, a month time dimension, and a week time dimension, a year time dimension, etc. may also be added in the horizontal direction. The vertical data flow is from bottom to top, and the horizontal data flow is from left to right, so that a crossed directed acyclic flow structure is formed.

The table structures of the first data table, the second data table and the third data table are the same, and it can be understood that the unified data table structure is an important link in applying the data warehouse modeling. Each layer of data comprises a plurality of Hive data tables, and after the division and layering are completed, the number of the data tables, the table names, the number of table fields, the field names and the field types in each horizontal and vertical layering are required to be determined. Combining two data types of array < string > or map < string, string > provided by a Hive database system through a table building mode of combining an original data type and a complex data type, and combining part of dimensions or index fields into an array or key in a coding mode: the value pair is established into a complex table structure, so that the data dimension and index expansion is realized, and the meaning of the historical data is not influenced. Only one or one type of data table needs to be created, the operation of creating the data tables respectively for different applications is released, the disorder condition caused by excessive table naming and field naming is also avoided, different applications are more uniform, and management is convenient, for example, the structure of a database table can be as shown in the following table 6:

TABLE 6

It is to be understood that the above-described embodiments are only a few, but not all, embodiments of the present invention. In order to better understand the data processing method, the following describes the above process with reference to an embodiment, but the process is not limited to the technical solution of the embodiment of the present invention, and specifically:

fig. 4 is a flow chart of a data processing method for modeling a bin according to an embodiment of the present invention, and as shown in fig. 4, in an alternative embodiment, first, according to a service requirement, a data source table and a persistent time dimension configuration table of a good source data layer are determined, if a data source is a fixed data source, join filtering operation of Hive SQL is performed through some attribute data of devices in the source data and configuration table, data required by the devices is obtained, and result data is written into a Hive data table having the same structure as the source data table in an intermediate layer, and is used as data for subsequent calculation. In the SQL statement of the filtering process, the partition field of the source data and the equipment code in the configuration table are embodied in the form of parameters, and the specific partition field value and the equipment code are written into the task file of the Azkaban, so that the reusability of the SQL statement is ensured and the SQL statement can be used as a template for repeated use of subsequent equipment; if the data source is a non-fixed data source, the filtering of the configuration table is not needed, and the data source is directly used for the equipment.

After determining the data source and the filtering operation, next, it is necessary to determine the time dimension of the middle layer table and perform the division of the original data type and the complex data type on the table field, for example, Id, wifiType, dimCode, String type of the indexCode field is the original data type, map < String, String > type of the dimValue field is the complex data type, and then a physical data table is created in Hive through a database operating language SQL.

And calculating the number of required attributes according to the indexes according to the service requirements, and classifying the calculation logic. If the data source is a fixed data source, the data source can be divided into a fixed source single-attribute index, a fixed source multi-attribute index and the like; if the data source is uncertain or involves excessive attributes and involves special handling, it is necessary to have separate computation logic. After the computation logic is classified, the fixed data source writes index codes or other condition character strings into task files in a parameter form by combining SQL with the task files of Azkaban through the middle layer data after the filtering operation, for example, '01050501', 01050502 'and 01050503' form a computation logic template, and other devices can reuse the template by the same computation logic. The calculated data does not distinguish data sources, and the calculation results are written into a data table of the corresponding time dimension of the middle layer, such as hour, day and month. If the required result data requires that the time granularity is small, subsequent longitudinal logic calculation can be directly carried out; if the result data is day data, the day data can be transversely summarized to a day data table through the data of the hour middle layer; if the data is monthly data, the data can be summarized into the monthly table through the daily data table, so that intermediate process data can be reserved for facilitating subsequent problem investigation and the like; if the data can be directly output to the daily degree intermediate table without hour, the monthly data suggestion is still obtained by summarizing from the daily degree data table, so that the data of the whole month is decomposed to be summarized after daily calculation, and the problems of insufficient calculation resources, overlarge server system pressure, overlong calculation time and the like caused by overlarge data quantity can be avoided.

The calculated data and the same id contain one or more indexes, and each index is expressed by { index code: index value, which exists in the form of a key value pair, and needs to be split into data recorded one line per index, the index code string field may be split by using SQL statement, horizontal view extension (split ([ index _ cd ], ',')) s2 as [ index _ cd ], and then the index value may be split by using index _ value [ s2.index _ cd ], and the resulting data may be written into the index summary data table of hive. The process does not distinguish data sources, and is a reusable process, and because the intermediate table structure is fixed, the target table structure is fixed, and the SQL logic of index splitting is fixed, other subsequent devices can continue to use.

The data obtained by splitting and summarizing the indexes is the result required by the user, and then the calculated result is provided, and the process is opened for the data. And managing and controlling the opened data through the index configuration table due to the problem related to the data authority in the data opening. And the data summary table and the index configuration table are subjected to join association through the index codes, open control is realized, and the result data is written into the data table of the Hive open data layer after association. The index code is bound with the device id in the index configuration table, the device id is transmitted into the SQL executive program in a parameter mode through the Azkaban task file, the open process logic is also fixed, and other devices can be repeatedly used. The data of the open layer can be synchronized into a relational database such as mySQL and the like through datax or other related components, and can be used by others.

In addition, the technical scheme of the embodiment of the invention uses a data source classification processing and data logic horizontal and vertical net-shaped cross layered structure, simultaneously, a data intermediate layer table structure adopts a table building mode of combining an original data type and a complex data type, integrates a unified equipment intermediate table, avoids the field uniqueness of different equipment in an index coding mode, and adopts calculation logic classification and templating and different equipment and indexes to be configured in a table configuration mode, thereby solving the problems of maintenance difficulty and cost increase caused by increasing the number of data tables by equipment, avoiding the element number increase generated by table building and excessive occupation of cluster memory, simultaneously improving the expansibility of data dimension and indexes, reducing the modeling difficulty of equipment, shortening the development period, improving the development efficiency, reducing the development cost and increasing the reusability of data and calculation, and the modular process is used, so that the development mode is more flexible.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In this embodiment, a data processing apparatus is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated for what has been described. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated.

Fig. 5 is a block diagram of a data processing apparatus according to an embodiment of the present invention, the apparatus including:

a first obtainingmodule 52, configured to obtain multiple log data reported by multiple devices, and store the multiple log data in a source data layer;

a second obtainingmodule 54, configured to obtain a target device requirement from a pre-configured configuration table, where the configuration table stores a plurality of device requirements, and the device requirement is used to indicate to obtain an index value of a target device from the plurality of devices;

theprocessing module 56 is configured to extract data corresponding to the target device requirement from the plurality of log data according to the target device requirement, calculate the data corresponding to the target device requirement to determine a data result requested by the target device requirement, and store the data result in an open data layer, so that a data demander obtains the data result from the open data layer, where an intermediate data result obtained in a process of calculating the data corresponding to the target device requirement is stored in an intermediate data layer.

According to the invention, a data processing device is introduced, the reported log data are obtained from a plurality of application devices to obtain a plurality of log data, the log data are stored in a source data layer, then the target application requirements are obtained from a preset configuration table, the data corresponding to the target application requirements are extracted from the log data according to the target application requirements, the data corresponding to the target application requirements are calculated to determine the data result requested by the target application request, and the data result is stored in an open data layer so that a data demander can obtain the data result from the open data layer, wherein the intermediate data result obtained in the process of calculating the data corresponding to the target application requirements is stored in an intermediate data layer. By adopting the technical scheme, the problems that different devices have independent data hierarchies and the like, so that the data is difficult to integrate, the storage hierarchy is disordered and the like are solved, and the data of the different devices are integrated by the data processing method.

In an alternative embodiment, theprocessing module 56 is configured to: acquiring computing logic of the indexes of the plurality of devices, and generating a computing template corresponding to the indexes for the indexes with the same computing logic according to the computing logic; calculating the partial data according to the calculation template to obtain a first calculation result; determining a target algorithm for other data except the partial data in the data, and calculating the other data according to the target algorithm to obtain a second calculation result; and determining the data result according to the first calculation result and the second calculation result.

A second obtainingmodule 54, configured to configure the configuration table by: performing a configuration operation on the configuration table by a target object to determine the configuration table, wherein the configuration operation includes at least one of: defining operation, adding operation and deleting operation. In the present embodiment, the data configuration plays a role in data filtering and supplementing in the whole application modeling, and the configuration data is usually stored in a data table. In data layering, the vertical direction can occur in any layer, such as a source data layer or an intermediate data layer, and the horizontal direction generally occurs in a permanent time dimension through the associated use with the source data or the intermediate data. One application may have one or more configuration tables, and the data in the data configuration table is designed, defined, and subsequently added and deleted by the data development engineer according to the application requirements, and the following lists configuration data tables and data examples, where table 4 is an attribute configuration table and table 5 is an index configuration table.

In an alternative embodiment, theprocessing module 56 is further configured to: determining a data type of each log data of the plurality of log data, wherein the data type includes at least one of: the fixed data source is used for indicating log data to be reported by the multiple devices, and the non-fixed data source is used for indicating historical log data stored in the multiple devices; and under the condition that the data type is a fixed data source, executing filtering operation of query selection in the plurality of log data according to index data corresponding to the target application request so as to extract data corresponding to the target application request from the plurality of log data according to the target application request, wherein if the data type is a non-fixed data source, the filtering operation is not required, and the data type is directly used for the application.

Optionally, theprocessing module 56 is further configured to: receiving an inquiry request sent by a target object through the open data layer, wherein the inquiry request is used for acquiring a target data result in the data results; and responding to the query request, and feeding the target data result back to the target object under the condition that the target object is determined to have the query authority. In this embodiment, because the data is opened due to the problem related to the data authority, the opened data is managed and controlled through the index configuration table. And the data summary table and the index configuration table are subjected to join association through the index codes, open control is realized, and the result data is written into the data table of the Hive open data layer after association. The index code is bound with the application id in the index configuration table, the application id is transmitted into the SQL executive program in a parameter mode through the Azkaban task file, and data of the open layer can be synchronized into a relational database such as mysql and the like through datax or other related components and is provided for other people to use.

embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, acquiring a plurality of log data reported by a plurality of devices, and storing the plurality of log data in a source data layer;

s2, obtaining a target device requirement from a pre-configured configuration table, wherein the configuration table stores a plurality of device requirements, and the device requirements are used for indicating index values of the target device obtained from the plurality of devices;

and S3, extracting data corresponding to the target device requirement from the plurality of log data according to the target device requirement, calculating the data corresponding to the target device requirement to determine a data result requested by the target device requirement, and storing the data result in an open data layer for a data demander to obtain the data result from the open data layer, wherein an intermediate data result obtained in the process of calculating the data corresponding to the target device requirement is stored in an intermediate data layer.

In an exemplary embodiment, the computer-readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

For specific examples in this embodiment, reference may be made to the examples described in the above embodiments and exemplary embodiments, and details of this embodiment are not repeated herein.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

In an exemplary embodiment, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

It will be apparent to those skilled in the art that the various modules or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and they may be implemented using program code executable by the computing devices, such that they may be stored in a memory device and executed by the computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A data processing method, comprising:

acquiring a plurality of log data reported by a plurality of devices, and storing the plurality of log data in a source data layer;

acquiring a target application requirement from a pre-configured configuration table, wherein the configuration table stores a plurality of application requirements, and the application requirements are used for indicating index values of target devices acquired from the plurality of devices;

extracting data corresponding to the target application requirement from the plurality of log data according to the target application requirement, calculating the data corresponding to the target application requirement to determine a data result requested by the target application requirement, and storing the data result in an open data layer for a data demander to obtain the data result from the open data layer, wherein an intermediate data result obtained in the process of calculating the data corresponding to the target application requirement is stored in an intermediate data layer.

2. The method of claim 1, wherein computing data corresponding to the target application requirements to determine the data result requested by the target application request comprises:

determining a calculation template required for calculating partial data in the data, and calculating the partial data according to the calculation template to obtain a first calculation result;

determining a target algorithm for other data except the partial data in the data, and calculating the other data according to the target algorithm to obtain a second calculation result;

and determining the data result according to the first calculation result and the second calculation result.

3. The method of claim 1, wherein determining a computation template required to compute a portion of the data comprises:

computing logic that obtains metrics for the plurality of devices;

and for the indexes with the same calculation logic, generating a calculation template corresponding to the indexes according to the calculation logic.

4. The method of claim 1, wherein before obtaining the target application requirement from the pre-configured configuration table, the method further comprises:

configuring the configuration table by:

performing a configuration operation on the configuration table by a target object to determine the configuration table, wherein the configuration operation includes at least one of: defining operation, adding operation and deleting operation.

5. The method of claim 1, wherein extracting data corresponding to the target application requirement from the plurality of log data according to the target application requirement comprises:

determining a data type of each log data of the plurality of log data, wherein the data type includes at least one of: the fixed data source is used for indicating log data to be reported by the multiple devices, and the non-fixed data source is used for indicating historical log data stored in the multiple devices;

and determining the filtering operation of each log data in the plurality of log data according to the data type so as to extract the data corresponding to the target application requirement from the plurality of log data according to the target application requirement.

6. The method of claim 5, wherein determining a filtering operation for each log data of the plurality of log data according to the data type to extract data corresponding to the target application requirement from the plurality of log data according to the target application requirement comprises:

and under the condition that the data type is a fixed data source, executing a filtering operation of query selection in the plurality of log data according to the index data corresponding to the target application request so as to extract the data corresponding to the target application request from the plurality of log data according to the target application request.

7. The method of claim 1, wherein after saving the data results in an open data layer, the method further comprises:

receiving an inquiry request sent by a target object through the open data layer, wherein the inquiry request is used for acquiring a target data result in the data results;

and responding to the query request, and feeding the target data result back to the target object under the condition that the target object is determined to have the query authority.

8. The method according to any one of claims 1 to 7, wherein one or more first data tables correspond to the source data layer, one or more second data tables correspond to the intermediate data layer, and one or more third data tables correspond to the open data layer, wherein the table structures of the first data tables, the second data tables, and the third data tables are the same.

9. A data processing apparatus, comprising:

the first acquisition module is used for acquiring a plurality of log data reported by a plurality of devices and storing the plurality of log data in a source data layer;

a second obtaining module, configured to obtain a target application requirement from a pre-configured configuration table, where the configuration table stores a plurality of application requirements, and the application requirement is used to indicate to obtain an index value of a target device from the plurality of devices;

the processing module is used for extracting data corresponding to the target application requirement from the plurality of log data according to the target application requirement, calculating the data corresponding to the target application requirement to determine a data result requested by the target application request, and storing the data result in an open data layer for a data demander to obtain the data result from the open data layer, wherein an intermediate data result obtained in the process of calculating the data corresponding to the target application requirement is stored in an intermediate data layer.

10. A computer-readable storage medium, comprising a stored program, wherein the program is operable to perform the method of any one of claims 1 to 8.

11. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 8 by means of the computer program.