Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
For convenience of understanding, terms referred to in the embodiments of the present application are explained below.
Automatic data query: in the related art, most report processing software interacts with the database of the report and depends on the familiarity of the processing personnel with the SQL statement; the method is suitable for an automatic data query scene, in which a processor does not need to master SQL sentences any more, and automatic determination and subsequent calling of data of the report are realized through automatic data query; the invention provides an information identification method applied to automatic data query, wherein the automatic identification of information is involved in order to realize an automatic data query scene.
And (4) target report form: in the present application, a report refers to a table with a table header, a table body, and a table annotation, such as a financial report, and the target report refers to a currently processed report object; it should be noted that, in the present application, the target report only needs to be implemented as having the header information, the header information is analyzed, the input information of the automatic data query is obtained to implement the automatic data query scene, and the coordinate position of the data to be filled in the target report is provided by using the operation relationship between the spatial coordinates.
Four-dimensional coordinates: for the coordinate format of the information, the present application is described with an example of a coordinate structure of a row start, a row end, a column start and a column end, where a column coordinate range refers to a range between the column start and the column end, and a row coordinate range refers to a range between the row start and the row end, but the present application is not limited thereto.
Example 1
Referring to fig. 1, a flowchart of an information identification method applied to an automatic data query according to an exemplary embodiment of the present application is shown, where the method includes:
step 101, receiving the typesetting information of the target report.
In an actual operation process, the information identification method applied to automatic data query provided by the embodiment of the present application is described by taking a computer program as an example. As shown in fig. 2, which shows a layout diagram of a target report, aninterface 200 is a third-party software interface displayed by a terminal device, and when a user uses the third-party software, the carrier computer program of the present application implements the information identification method after providing, for example, atarget report 210. Optionally, theinterface 200 may also be an interface schematic diagram of the computer program when the computer program is run in an embedded manner, which is not limited in this embodiment of the present application.
The method comprises the steps of automatically generating typesetting information for a target report through an interface identification module, and receiving the typesetting information of the target report.
Optionally, the layout information includes at least one of row header information and list header information of the target report, where a receiving format of the layout information is composed of four-dimensional coordinates and a header name, and the row header information and the list header information are used to provide necessary input information for automatic data query.
As shown in fig. 2, the row header includes cells in which the header names "region", "product line", "jingjin", "shanghai", "south", "products 1", "products 2", "total 1", "beijing", "tianjin", "total 2", "shenzhen", and "guangzhou" are located, and the header includes cells in which the header names "time", "salesman", "2020 year", "2021 year", "group leader", "first name", "second name", "average", "number of people", "amount to be paid", "month 1", "month 2", "total 3", "month 4", and "total 4". The numbers of the months and the products are only schematic illustrations, the header names to be expanded can be determined by the program according to the read actual data, for example, each month 1 can be expanded according to the actually related month to obtainmonths 1 and 2 months … …, wherein the expansion is automatically performed by the program according to the read actual data according to each sub-level concept under the current header concept, and the embodiment of the application is not limited thereto.
Correspondingly, the receiving format of the row header information and the list header information may further include additional information, in an example, as shown in fig. 2, an example of the row header information is "jingjin", and in the corresponding information format, the four-dimensional coordinates are (5,5,5,7), the name of the header is "jingjin", and the additional information is not present (for example, the additional information is added, and "sales/1000" can be written); in another example, as shown in fig. 2, the list header information is exemplified by "group leader", and the corresponding information format has four-dimensional coordinates of (15,16,3,4), a list header name of "group leader", and no additional information (e.g., additional information is added, and "person name" can be written).
And 102, identifying a row header area and a list header area corresponding to the target report according to the typesetting information.
As shown in fig. 2, the computer automatically performs the operation of identifying the row header area and the column header area corresponding to the target report according to the typesetting information afterstep 101, considering that there is a priority and inclusion relationship between the header names, so that the row header and the column header not only contain the header names of a single or several equal levels, but also exist in the row header area and the column header area.
The four-dimensional coordinates of the row header area correspond to the row header information, and the four-dimensional coordinates of the list header area correspond to the list header information.
As shown in tables 1 and 2, information of the row header area and the column header area is shown, which is exemplified in the form of a table. As is clear from tables 1 and 2, the four-dimensional coordinates of the row header area and the column header area are provided by the four-dimensional coordinates in the row header information and the column header information, respectively.
TABLE 1
TABLE 2
And 103, identifying a travel node system according to the four-dimensional coordinates of the head area of the row table, and identifying a column node system according to the four-dimensional coordinates of the head area of the column table.
Instep 102, it is mentioned that the header names are considered to have a priority and inclusion relationship, so that the row header and the list header not only contain the header names of a single or several equivalent levels, but also exist in the row header area and the list header area. Instep 103, considering the characteristic that the header names have the level and inclusion relationship, the row node system is further identified according to the four-dimensional coordinates of the row header area, and the column node system is identified according to the four-dimensional coordinates of the column header area.
Fig. 3 is a schematic structural diagram illustrating a row node architecture provided in an embodiment of the present application, and fig. 4 is a schematic structural diagram illustrating a column node architecture provided in an embodiment of the present application. Each node hierarchy corresponds to table 1 and table 2.
And 104, performing permutation and combination operation on all tail end nodes of the row node system and all tail end nodes of the column node system to obtain a coordinate crossing result.
In one example, as shown in FIG. 3, all end nodes of the row node hierarchy are "Total 1", "Beijing", "Tianjin", "Shanghai", "Total 2", "Shenzhen", "Guangzhou", "products 1", and "products 2", and all end nodes of the column node hierarchy are "month 1", "month 2", "Total 3", "month 4", "Total 4", "group Length", "first name", "second name", "average", "number of people", and "amount to be returned", as shown in FIG. 4.
Further, according to the information of the row header area and the column header area shown in tables 1 and 2, the four-dimensional coordinates of each header name in the above example are obtained, and are subjected to permutation and combination operation, so that a coordinate intersection result can be obtained.
As shown in fig. 2, the permutation and combination operation is performed on "total 1" and "month 1", and cell information with four-dimensional coordinates of (7,7,5,5) is obtained, and so on, and the coordinate intersection result is obtained.
And 105, sorting the coordinate crossing result to obtain at least one group of identification information.
The at least one group of identification information comprises row header information, list header information and coordinate information of data to be filled, wherein the coordinate information of the data to be filled is used for providing coordinate positions of the data to be filled on the target report after automatic data query. Instep 104, the cell area obtained by the intersection is a filling area to be filled with data.
It should be noted that two major factors forming one report query are: and determining the position of the query result and the query of the target data source. Wherein, the first necessary factor is realized by both rows and columns participating in the intersection instep 104, that is, the target data source is pointed according to the row header information and the list header information; step 105 realizes a second necessary factor, namely, determining the coordinate position of the data to be filled according to the coordinate information of the data to be filled, thereby completing one report query.
In the related art, if the BI report industry is wide, the applied report processing software manually determines the two necessary factors (namely, the SQL formula for writing the query at the designated coordinate position or the query formula similar to the SQL and defined by the software manufacturer) based on the operator, the processing mode in the related art is time-consuming and the effect is more dependent on the manual operation capability, so that the problem of difference between the report processing level and the effect exists, and the utilization of the header information is ignored.
To sum up, for an automatic data query scenario, embodiments of the present application provide an information identification method applied to automatic data query, which automatically identifies a row header area and a list header area corresponding to a target report according to received layout information of the target report, further identifies a row node system and a column node system according to each header information, and obtains at least one set of identification information by using a spatial position analysis method, where the identification information includes row header information, list header information, and coordinate information of data to be filled, and the coordinate information of the data to be filled is used to provide a coordinate position of data to be filled on the target report after automatic data query, so as to overcome a problem of difficulty in information collection before automatic data query (e.g. automatically writing other database query statements such as SQL, etc.), and the automatic data query scenario of the present application replaces a method in which a handler writes other database query statements such as SQL, etc., to fill in report data in related technologies, the manual information collection and processing cost is greatly reduced, and the method becomes a necessary premise for automatic processing work of subsequent target reports.
Example 2
Referring to fig. 5, a flowchart of an information identification method applied to an automatic data query according to another exemplary embodiment of the present application is shown, where the method includes:
step 501, receiving typesetting information of a target report.
Please refer to step 101, which is not described herein again.
Step 502, identifying a row header area and a list header area corresponding to the target report according to the typesetting information.
Please refer to step 102, which is not described herein again.
In one possible implementation,step 502 may be followed by one ofsteps 503 and 505, with the order of execution being shown in fig. 5.
Step 503, traversing the range of the row end coordinates and the column coordinates of each row node to identify the corresponding child node, wherein the row node which is not identified as the child node is the root node of the row node system, and the row node which does not have the child node is the end node of the row node system.
The child nodes and the corresponding row nodes have row adjacency relation and column inclusion relation, the row adjacency relation is determined according to the relation between the row start coordinates of the child nodes and the row end coordinates of the corresponding row nodes, and the column inclusion relation is determined according to the column coordinate range of the child nodes and the column coordinate range of the corresponding row nodes.
In one example, the object currently identifying the child node is row node A, whose four-dimensional coordinates are expressed as (1,2,1,4), and there are also row node B's four-dimensional coordinates of (3,4,1, 2). In a possible embodiment, when the last row coordinate of the row node a is defined as 2 and the start row coordinate of the row node B is defined as 3, the row node a and the row node B have a row adjacency relationship under the condition that the last row coordinate of the row node a +1 is defined as the start row coordinate of the row node B; further, the column coordinate range of row node a is (1,4), the column coordinate range of row node B is (1,2), and in a possible embodiment, the row node a and the row node B have a column inclusion relationship when the column coordinate range of row node a is specified to include the column coordinate range of row node B. Thus, the row node B is identified as a child node of the row node a on the condition that the row adjacency and column inclusion relationship is satisfied.
And step 504, arranging the root nodes and the child nodes to obtain a row node system.
Alternatively, considering the case where the layout information may further include slash header information, step 504 may include the following one to three.
And content I, identifying a preset root node of a trip node system according to the slash division of slash header information.
And secondly, determining the root node obtained in the traversal process as a child node of the preset root node.
And thirdly, arranging the preset root node and the child nodes to obtain a row node system.
Correspondingly, the slash header refers to thearea 211 in fig. 2, and if thearea 211 is currently invalid information, the layout information includes at least one of the line header information and the list header information of the target report; and if the slash header is effective information, the typesetting information comprises at least one of row header information, list header information and slash header information of the target report.
As shown in fig. 6, a schematic diagram of the slash header information as valid information is shown. In thisarea 211, slash header information is displayed.
In one example, according to the common slash header type, two common slash header cases are provided, fig. 6 (a) and fig. 6 (b), respectively. Taking (a) as an example, the slash header information includes "header name 1" information, "header name 2" information, and "header name 3" information, in the header, the header name 1 is a root node with the highest priority in the row node system (denoted as a preset root node), and the header name 3 is a root node with the highest priority in the column node system (denoted as a preset root node). In fig. 6 (a) and 6 (b), the numbers of the header names are merely schematic descriptions.
And 505, traversing the range of the column end coordinates and the row coordinates of each column node to identify the corresponding child nodes, wherein the column node which is not identified as the child node is a root node of a column node system, and the column node without the child node is a tail end node of the column node system.
The child nodes and the corresponding column nodes have a column adjacency relation and a row inclusion relation, the column adjacency relation is determined according to the relation between the column start coordinates of the child nodes and the column end coordinates of the corresponding column nodes, and the row inclusion relation is determined according to the row coordinate range of the child nodes and the row coordinate range of the corresponding column nodes.
In one example, the object currently identifying the child node is column node C, whose four-dimensional coordinates are expressed as (7,10,3,3), and there is also a column node D whose four-dimensional coordinates are (7,7,4, 4). In a possible embodiment, when the last column coordinate of the column node C +1 is the first column coordinate of the column node D, the column node C and the column node D have a column adjacent relationship; further, the row coordinate range of the column node C is (7,10), the row coordinate range of the column node D is (7,7), and in a possible embodiment, the column node C and the column node D have a row inclusion relationship when the row coordinate range of the column node C is specified to include the row coordinate range of the column node D. Thus, the column node D is identified as a child node of the column node C on the condition that the above-described column adjacency and row inclusion relationship is satisfied.
Step 506, the root node and the child nodes are sorted to obtain a column node system.
Alternatively, considering the case where the layout information may also include slash header information, step 506 may include the following one to three.
And content I, identifying a preset root node of a column node system according to the slash division of slash header information.
And secondly, determining the root node obtained in the traversal process as a child node of the preset root node.
And thirdly, arranging the preset root node and the child nodes to obtain a column node system.
Step 507, acquiring column coordinates of all end nodes in the row node system.
And step 508, acquiring the row coordinates of all the end nodes in the column node system.
And 509, intersecting the column coordinates and the row coordinates to obtain a coordinate intersection result.
And step 510, sorting the coordinate crossing results to obtain at least one group of identification information.
In one example, in the row node hierarchy, "Shenzhen" is taken as an example, and in the column node hierarchy, "2 months" is taken as an example. In this example, it can be understood that, according to the node association condition of the "Shenzhen" in the row node system, the program can automatically infer information of "south China" and "region" (i.e., automatically infer the target data source of the "Shenzhen"), and so does "2 months". In addition, the program can also obtain the filtering condition when the data is automatically queried according to the automatic inference result.
Correspondingly, the target data source and the terminal node information automatically deduced by the program form a coordinate crossing result, and the identification information is obtained through information arrangement and induction. If the corresponding SQL statement is obtained:
select sum as sales statistics
From sales report
Where region and time are 2020 and 2 months
Further, the Shenzhen intersects with the Shenzhen 1, and a corresponding SQL statement can be obtained:
select sum as sales statistics
From sales report
Where region and time are 2020 year +1 month
Since "1 month" and "2 month" are not specific months but are expansion terms of the concept "2020", the two SQL statements may be merged as follows:
statistics of Select month as month, sum as sales
From sales report
Where region and time are Shenzhen and 2020
Group by month
It should be noted that, the result of the above grouping statistics is more than 2 time items (month 1 and month 2), and the program automatically and correspondingly inserts rows or columns in the data filling coordinate area for accommodation.
Schematically, as shown in fig. 7, it shows a populated data table 700 obtained by automatic information identification of a program in an automatic data query scenario according to an embodiment of the present application. As can be taken from FIG. 7, for the current sales report, each month expands to 1-12 months in the month corresponding to 2020, corresponding to thetarget report 210 shown in FIG. 2.
In the embodiment of the application, the construction principle of a row node system and a column node system is further disclosed, and the analysis of the spatial position is realized through the construction of the node system; furthermore, the slash header condition in the report is also considered, and the scenes encountered by recognition are further enriched.
Referring to fig. 8, a block diagram of an information identification apparatus for automatic data query according to an embodiment of the present application is shown. The apparatus may be implemented as all or part of a computer device in software, hardware, or a combination of both. The device includes:
aninformation receiving module 801, configured to receive layout information of a target report, where the layout information includes at least one of row header information and list header information of the target report, a receiving format of the layout information is composed of four-dimensional coordinates and a header name, and the row header information and the list header information are used to provide necessary input information for automatic data query;
anarea identification module 802, configured to identify a row header area and a column header area corresponding to the target report according to the layout information, where a four-dimensional coordinate of the row header area corresponds to the row header information, and a four-dimensional coordinate of the column header area corresponds to the column header information;
asystem identification module 803, configured to identify a row node system according to the four-dimensional coordinates of the row header area, and identify a column node system according to the four-dimensional coordinates of the column header area, where the node system includes a root node, a child node, and a terminal node;
a coordinateoperation module 804, configured to perform permutation and combination operation on all end nodes of the row node system and all end nodes of the column node system to obtain a coordinate intersection result;
theinformation integration module 805 is configured to sort the coordinate crossing result to obtain at least one set of identification information, where the at least one set of identification information includes the linelist header information, and coordinate information of data to be filled, and the coordinate information of the data to be filled is used to provide a coordinate position of the data to be filled on the target report after automatic data query.
Optionally, thesystem identification module 803 includes:
the first identification unit is used for traversing the row end coordinates and the column coordinate ranges of all the row nodes to identify the corresponding child nodes, the child nodes and the corresponding row nodes have row adjacent relation and column containing relation, the row adjacent relation is determined according to the relation between the row start coordinates of the child nodes and the row end coordinates of the corresponding row nodes, the column containing relation is determined according to the column coordinate ranges of the child nodes and the column coordinate ranges of the corresponding row nodes, the row nodes which are not identified as the child nodes are root nodes of the row node system, and the row nodes which do not have the child nodes are end nodes of the row node system;
and the second identification unit is used for sorting the root node and the child nodes to obtain the row node system.
Optionally, thesystem identification module 803 further includes:
a third identifying unit, configured to traverse a column end coordinate and a row coordinate range of each column node to identify a corresponding child node, where the child node and the corresponding column node have a column adjacency relation and a row inclusion relation, the column adjacency relation is determined according to a relation between a column start coordinate of the child node and a column end coordinate of the corresponding column node, and the row inclusion relation is determined according to a row coordinate range of the child node and a row coordinate range of the corresponding column node, where a column node that is not identified as a child node is a root node of the column node system, and a column node that is not identified as a child node is an end node of the column node system;
and the fourth identification unit is used for sorting the root node and the child nodes to obtain the column node system.
Optionally, in response to that the layout information further includes slash header information, the second identifying unit is further configured to:
identifying a preset root node of the line node system according to the slash division of the slash header information;
determining the root node obtained in the traversal process as a child node of the preset root node;
and arranging the preset root node and the child nodes to obtain the row node system.
Optionally, in response to that the layout information further includes slash header information, the second identifying unit is further configured to:
identifying a preset root node of the column node system according to the slash division of the slash header information;
determining the root node obtained in the traversal process as a child node of the preset root node;
and arranging the preset root node and the child nodes to obtain the column node system.
Optionally, the coordinate operation module 704 includes:
the first arithmetic unit is used for acquiring the column coordinates of all the tail end nodes in the row node system;
the second operation unit is used for acquiring the row coordinates of all the tail end nodes in the column node system;
and the third operation unit is used for intersecting the column coordinates and the row coordinates to obtain a coordinate intersection result.
The present embodiments also provide a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the information identification method applied to automatic data query as provided in the above embodiments.
Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM).
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.