Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
As mentioned earlier, more and more data analysis tools are emerging for users to perform data analysis. In some data analysis tools, a user is supported to configure data query through a Visual Language (VL), or a Visual programming Language, and then display a data query result through a Visual chart. For intuitive understanding, aVL configuration area 101 and apresentation area 102 of a visualization chart are shown in fig. 1.
If the query requirement of the user changes, the user needs to manually modify VL configuration information in the interface. For example, referring to fig. 2, if the user wants to further analyze the sum-of-unit-cost indicator after viewing the contents of the interface of fig. 1, he needs to click on theindicator icon 201 in the configuration area, find the unit-cost icon 202 in the drop-down menu, and then click on the sum-of-unit-cost icon 203. Thus, the user is required to compare the chart area and the configuration area back and forth to modify the VL configuration information, which typically interrupts the user's analyzed cardiac flow experience.
Based on the method, the inventor provides a scheme, the user is supported to directly initiate query through interaction based on elements in the visual chart, meanwhile, related interaction information can also be synchronously converted into VL, and linkage of the chart and the VL triggered by operation of the visual chart is achieved. Fig. 3 illustrates a query interface change diagram in response to a user click command on acolumn header 301 of a sum of unit prices column in a table, showing a pull-down menu 302, so that a sum of unit prices-to-occupation ratio 304 is added on the right side of a statistical table and a sum of unit prices-to-occupationratio configuration component 305 is added in a metric row in a VL configuration region according to the user click command on anoccupation ratio item 303 in the group, according to an embodiment.
Therefore, by adopting the scheme disclosed by the embodiment of the specification, the smooth experience of vision and operation can be brought to the user, and the time and the energy consumed by the user in data analysis are effectively saved.
The following describes the implementation steps of the above scheme with reference to specific examples. Fig. 4 is a flowchart illustrating a method for querying data according to an embodiment, where an execution subject of the method may be any device, platform, or equipment cluster with computing and processing capabilities, and may be, for example, data analysis software or a user terminal installed with data analysis software. As shown in fig. 4, the method comprises the steps of:
step S410, displaying first configuration information in a visual language form and a corresponding first result chart in a query interface; step S420, receiving a query instruction initiated by a user based on an element in the first result chart; step S430, updating and displaying the first configuration information as second configuration information based on the query instruction; step S440, displaying the first result chart as a second result chart corresponding to the second configuration information.
The development of the above steps is as follows:
first, in step S410, a first configuration information in a visualization language and a corresponding first result chart are presented in a query interface. It is to be understood that the visual language VL, which refers to a computer language that visually represents objects, concepts and procedures in a computing process, is a multi-dimensional structure represented by a spatial arrangement of graphical symbols. For example, graphic symbols such as unit price summation, logistics manner, indexes, dimensions, and the like, shown in theconfiguration area 101 in fig. 1 belong to a visualization language, wherein the displayed VL content forms corresponding VL configuration information.
In one embodiment, the step may comprise: first configuration information input by a user based on a query interface is received. It is to be understood that the query interface supports user interaction and may be a client interface provided by a data analysis tool or the like. In one example, receiving first configuration information input based on theconfiguration area 101 shown in fig. 1 includes: unit price summation indicator, order date (day) and logistics mode dimension, and screening condition that order date is between the last 7 days.
And then, determining a data query script based on the first configuration information, and querying the data storage system based on the data query script to obtain a first data query result. It should be understood that the scripting Language of the data Query script is adapted to the database storage system, for example, the data storage system is a database, the scripting Language of the corresponding Query script may be a Structured Query Language (SQL), for example, the file format in the database storage system is an excel table, and the scripting Language of the corresponding Query script may be a Python Language. In one embodiment, the first configuration information is input into a machine learning model trained in advance to obtain an output data query script. However, the training of the machine learning model depends on a large amount of training corpora and calculation, so that the practical implementation difficulty is high.
In another embodiment, the first configuration information is first converted into a first query text in Natural Language (NL), and then the first query text is completely controllably and interpretably manipulated, so that the first query text is accurately translated into a corresponding query script.
For the conversion from VL configuration information to NL query text, in a specific embodiment, the conversion can be implemented by a preset conversion rule; in another specific embodiment, an intermediate language may be designed, such that a syntax tree corresponding to the intermediate language is constructed based on the VL configuration information, and the corresponding NL text is determined from the syntax tree. On the other hand, in a specific embodiment, the converted NL text may also be presented in a query interface for the user to see. Illustratively, as shown in fig. 5, NL text resulting from conversion of VL configuration information in theconfiguration area 501 is shown in a text box 502: the dates of each order, unit price per logistics mode for the last 7 days were summed. In this way, a transition from VL to NL can be achieved.
For determining a corresponding data query script based on the first query text, the natural language is completely controllably and interpretably translated step by step into a query script language by performing entity recognition, syntax analysis, semantic analysis and script conversion on the first query text.
In a specific embodiment, entity recognition is firstly carried out on a first query text to obtain a plurality of participles and entity categories corresponding to the participles; then, correspondingly converting the plurality of participles into a plurality of query elements according to the entity category, wherein the plurality of query elements are related to metadata in the data storage system; then, a plurality of element combinations formed on the basis of a plurality of query elements are obtained by carrying out syntactic analysis on the plurality of participles; then, performing semantic analysis on the plurality of element combinations to obtain a plurality of query objects; thereafter, a data query script is constructed based on the number of query objects.
Further, in a more specific embodiment, the entity identification may be implemented as: acquiring a plurality of pre-constructed dictionaries corresponding to a plurality of entity categories; and matching the first query text with the plurality of dictionaries to obtain each participle and the entity category of the participle.
In a more specific embodiment, the query element transformation may be implemented as: aiming at each participle, converting the participle into corresponding query elements based on a conversion rule corresponding to the entity category of the participle; illustratively, if the entity category of a certain participle is a dimension value (e.g., female), a corresponding query element (e.g., u.sex = female) is formed based on a field value (e.g., female) in the data storage system that matches the certain participle, and a corresponding field name (e.g., sex) and table name (e.g., u).
In a more specific embodiment, the determining of the combination of elements may include: obtaining a plurality of word segmentation phrases by carrying out syntactic analysis on the plurality of word segmentation; according to the word groups, the multiple query elements are combined to obtain a plurality of element combinations; illustratively, including "gender not male" in the participle phrases, including "u.sex, ≠ and u.sex = male" in the plurality of query elements, from which the element combination "u.sex ≠ male" may be derived.
In a more specific embodiment, the determining of the query object includes: aiming at any element combination, processing the element combination into a corresponding query object based on a semantic processing rule corresponding to the entity class related to the element combination; illustratively, the key combination "p.amp" may be treated as the query object "SUM (p.amp)".
In a more specific embodiment, the constructing of the data query script includes: query keywords corresponding to the respective query objects are determined, thereby forming a data query script based on the query keywords and the respective query objects. In a more specific embodiment, for each query object, the query keyword corresponding to the query object is determined based on the entity category to which the query object relates and the mapping relationship between the entity category and the query keyword, for example, the SQL keyword corresponding to the query object "u.sex ≠ male" may be determined as where.
In a more specific embodiment, the data query script is constructed based on a number of query objects and grammar rules of a scripting language.
In this manner, the first query text can be interpretably translated stepwise into a data query script.
Therefore, the first configuration information can be directly translated into the data query script, or the first configuration information is translated into the first query text and then translated into the data query script; therefore, the data storage system is queried by using the data query script to obtain a first data query result, and it is understood that the data query result usually has an original data format in the data storage system.
And then drawing a corresponding first result chart based on the first data query result. It should be understood that a graph is a generic term for a graph (or statistical graph, data analysis graph, etc.) and a table (or statistical table, data analysis table, etc.), and either or both of them may be included. In addition, the chart may have various alternative forms, for example, the graph may include a line chart, a pie chart, a bar chart, or the like, and the table may include a cross table, a list table, a perspective table, or the like. In one embodiment, the first result chart may be in a form that is automatically selected according to the first data query result. In another embodiment, the first result chart may be in a form that is set by the user. According to one example, the cross-table shown in thepresentation area 102 of FIG. 1 may be plotted based on the first data query result.
In the above embodiment, the first configuration information is directly input by the user, so as to realize data query and related presentation.
In another embodiment, the first query text input by the user may be received first and then converted into the corresponding first configuration information. It is to be understood that the conversion of the first query text into the first configuration information, which is essentially a conversion from NL to VL, is the reverse of the above-described conversion from VL to NL. Specifically, in one embodiment, the transition from NL to VL may be implemented by preset rules; in another embodiment, an intermediate language may be designed such that a syntax tree corresponding to the intermediate language is constructed based on the NL query text, and the corresponding VL configuration information is determined from this syntax tree. According to an example, the query text "the order dates of the last 7 days and the unit price sum of the logistics modes" input by the user based on thetext box 502 is received and converted to obtain the configuration information shown in theconfiguration area 501.
Further, a first result chart is determined and presented based on the first configuration information or the first query text.
Therefore, the first configuration information and the first result chart can be displayed in the query interface, and preferably, the first query text can also be displayed.
Next, at step S420, a query instruction initiated by a user based on an element within the first result chart is received. Specifically, a chart modification operation sent by a user based on the element is received, and a query instruction is generated based on the chart modification operation.
In one embodiment, the first result chart includes a statistics table, and the elements of the statistics table include a column header or a row header of the statistics table. It is to be understood that the column header refers to the beginning position of a certain column in the table, for example, seecolumn header 301 in fig. 3; the line header refers to the beginning of a line in the table. Further, in a specific embodiment, in response to a trigger operation of a user on a certain column header in the statistical table, multiple column header names that are alternative are displayed, and then a trigger operation of the user on any one of the multiple column header names is received as a chart modification operation, so as to generate a query instruction. It should be understood that the form of the trigger operation may be a click, a long press, a voice control, etc., and the operation form is not limited. In one example, as shown in fig. 3, in response to a user clicking on acolumn header 301, a plurality of alternative column header names are displayed in a pull-down menu 302, and then a clicking operation on anintra-group proportion item 303 is received as a chart modifying operation.
In another embodiment, the first result chart comprises a statistical chart, and the elements of the statistical chart comprise coordinate axes or data marks in the statistical chart. Illustratively, the elements in the line graph include coordinate axes, line segments of the line graph, and the like; elements in the pie chart include sector area, sector label, sector percentage, sector color label, and the like.
Further, in a specific embodiment, in response to a user trigger instruction for a coordinate axis in the line graph, the input box is displayed, so that a coordinate interval input based on the input box is received, and a query instruction is further generated. In another specific embodiment, the data re-query is triggered in response to a selection operation of a coordinate interval issued by a user based on a coordinate axis. In one example, as shown in fig. 6, a selection operation of a coordinate interval (see a shaded rectangular part covering the coordinate axis) issued by a user based on the "order date" coordinate axis is received, and a query instruction is generated accordingly.
In another specific embodiment, in response to a user's trigger operation on a certain sector in the pie chart, a plurality of alternative sector marks are displayed, and then a user's trigger operation on any one of the sector marks is received as the chart modification operation. In one example, as shown in fig. 7, in response to a user clicking on a sector of category labels shanghai 701, a plurality of candidate categories are displayed in a drop-down menu, and then a clicking operation on the sector of category labels beijing 702 is received, and a query instruction is generated according to the clicking operation.
From above, a query instruction initiated by a user based on an element in a query result graph may be received. It should be understood that, after the result chart disclosed in the embodiment of the present specification is specially designed, a user is supported to directly perform special interaction based on elements in the current result chart, such as changing the content of the elements, so as to intuitively and conveniently initiate data re-query.
Then, in step S430, based on the received query instruction, the first configuration information is updated and shown as the second configuration information.
Specifically, the configuration information may be updated and displayed based on the chart modification operation indicated by the query instruction. In one embodiment, a configuration component corresponding to a chart element involved in a chart modification operation is determined, such that based on this configuration component, the first configuration information update is exposed as the second configuration information. In another embodiment, in response to a modification query for the chart, all elements in the current chart may be acquired, and after determining the corresponding configuration components, the second configuration information is assembled to replace the first configuration information. On the other hand, in one embodiment, a mapping relation between a chart element and a VL graphic symbol which is established in advance is obtained, so that the VL graphic symbol corresponding to the chart element related to the chart modification operation is determined, and a configuration component is formed and used for updating the configuration information.
Further, according to a specific embodiment, the query instruction indicates addition or replacement of column data in the table, and accordingly, VL graphic symbols corresponding to the relevant column header names may be determined to form corresponding configuration components, so as to update and display the configuration information. In one example, as shown in fig. 3, the query indicates the addition of theintra-group proportion item 303, and accordingly, the VL graphic symbol corresponding thereto is determined to be "unit price sum-proportion" to form the corresponding configuration component "index, unit price sum-proportion", and then the "unit price sum-proportion"configuration component 305 is additionally shown in the index row of the configuration area.
In another specific embodiment, the query indicates a selection of a coordinate axis interval in the line graph, and accordingly, VL graphical symbols corresponding to the selected coordinate axis interval may be determined to form the corresponding configuration component. In one example, as shown in FIG. 6, the query indicates selection of a coordinate axis interval 2022.01.01-2022.01.07 (see the shaded box selection area and the date interval on the upper right foot thereof), at which point the VL graphical symbol corresponding to the threshold may be determined to be the order date, thereby forming the corresponding configuration components "dimension, order date" and "filter, order date last 7 days (between 2022.01.01-2022.01.07)", thereby updating the configuration information shown in the configuration area, with the visual change that the firstgraphical symbol 601 in the filter row with "order date between last 3 days (2022-01-05-2022-01-07)" is updated to the secondgraphical symbol 602 with "order date between last 7 days (2022-01-01-2022-01-07)".
In yet another specific embodiment, the query indicates the addition or replacement of a sector, and accordingly, the VL graphical symbol corresponding to the marking of the sector may be determined, forming the corresponding configuration component. In one example, as shown in fig. 7, the query indicates that the category mark shanghai 701 in the sector is replaced by the category mark beijing 702, and accordingly, the VL graphic symbol corresponding to the category mark shanghai is determined to be "city = beijing", so as to form configuration components "dimension, city" and "filter, city = beijing", and further update and display the VL configuration information, and the visual change is that thegraphic symbol 703 of "city = shanghai" in the filter row is updated to thegraphic symbol 704 of "city = beijing".
Therefore, the VL configuration information can be updated and displayed according to the interactive operation of the visual chart.
Then, in step S440, the first result chart is updated and displayed as a second result chart corresponding to the second configuration information. Specifically, the data storage system may be queried based on the second configuration information to obtain a second data query result, and then a second result chart may be generated according to the second data query result. It should be noted that, for determining the corresponding result chart according to the configuration information, reference may be made to the relevant description in the foregoing embodiment, which is not described herein again.
Further, the first result chart update is shown as a second result chart. In one example, as shown in FIG. 3, the table in the chart display area changes, primarily including the addition of the monovalent sum-to-fraction column 304. In another example, as shown in fig. 6, the firstfold line graph 603 in the graph presentation area is updated to be presented as the secondfold line graph 604. In yet another example, as shown in FIG. 7, afirst pie chart 705 in the chart illustration area is updated as asecond pie chart 706.
On the other hand, in one embodiment, this step further includes displaying the first query text update as a second query text corresponding to the second configuration information. Specifically, first converting the second configuration information into a second query text; it should be noted that, for the conversion from the configuration information to the query text, reference may be made to the relevant description in the foregoing embodiments, which is not described herein again. And further, updating and displaying the query text. In one example, as shown in FIG. 6, the content in the text entry box in the query interface changes from "the sum of unit prices for each order date of the last 3 days" to "the sum of unit prices for each order date of the last 7 days".
In summary, by using the data query method disclosed in the embodiment of the present specification, a user is supported to directly initiate query through interaction based on elements in a visual chart, and meanwhile, related interaction information can also be synchronously converted into VL and/or NL, so that linkage of the chart, VL and NL triggered by operation of the visual chart is realized, and further, user experience is sufficiently improved.
Corresponding to the data query method, the embodiment of the specification also discloses a data query device.
Fig. 8 is a schematic structural diagram of an apparatus for querying data according to an embodiment, and as shown in fig. 8, theapparatus 800 includes: apresentation unit 810 configured to present first configuration information and a corresponding first result chart in a visual language in a query interface; aninstruction receiving unit 820 configured to receive a query instruction initiated by a user based on an element in the first result chart; theinformation updating unit 830 is configured to update and display the first configuration information as second configuration information based on the query instruction; achart updating unit 840 configured to display the first result chart update as a second result chart corresponding to the second configuration information.
In one embodiment, the first result chart comprises a statistics table, and the element comprises a column header or a row header in the statistics table.
In one embodiment, the first result chart comprises a statistical chart, and the element comprises a coordinate axis or a data mark in the statistical chart.
In one embodiment,presentation unit 810 is configured to: receiving the first configuration information input by a user based on the query interface; querying a data storage system based on the first configuration information to obtain a data query result; and drawing the first result chart according to the data query result.
In one embodiment,presentation unit 810 is further configured to: displaying a first query text in a natural language form corresponding to the first configuration information; theapparatus 800 further comprises atext updating unit 850 configured to: and displaying the first query text update as a second query text corresponding to the second configuration information.
In a specific embodiment, theapparatus 800 further includes a processing unit configured to receive the first query text input by the user based on the query interface, and convert the first query text into the first configuration information; or, the first configuration information input by the user based on the query interface is received, and the first configuration information is converted into the first query text.
In one example, the apparatus further comprises adata query unit 860 configured to query a data storage system based on the first query text, resulting in a data query result; and drawing the first result chart according to the data query result.
In a more specific example, thedata querying unit 860 is specifically configured to: performing entity identification on the first query text to obtain a plurality of participles and entity categories corresponding to the participles; correspondingly converting the plurality of participles into a plurality of query elements according to the entity category, wherein the plurality of query elements are related to metadata in the data storage system; obtaining a plurality of element combinations formed on the basis of the plurality of query elements by carrying out syntactic analysis on the plurality of participles; performing semantic analysis on the plurality of element combinations to obtain a plurality of query objects; and constructing a data query script based on the plurality of query objects, wherein the data query script is used for executing query operation on the data storage system to obtain the data query result.
In one embodiment, theinstruction receiving unit 820 is specifically configured to: receiving a chart modification operation issued by a user based on an element in the first result chart; generating the query instruction based on the chart modification operation; theinformation updating unit 830 is specifically configured to: determining a configuration component corresponding to the chart modification operation; updating the first configuration information to the second configuration information based on the configuration component.
In summary, by using the data query device disclosed in the embodiment of the present specification, a user is supported to directly initiate query through interaction based on elements in a visual chart, and meanwhile, related interaction information can also be synchronously converted into VL and/or NL, so that linkage of the chart, VL and NL triggered by operation of the visual chart is realized, and further, user experience is sufficiently improved.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 4.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 4.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.