Movatterモバイル変換


[0]ホーム

URL:


CN109063191B - Method and storage medium for OPTIONAL query on RDF dataset - Google Patents

Method and storage medium for OPTIONAL query on RDF dataset
Download PDF

Info

Publication number
CN109063191B
CN109063191BCN201810995475.7ACN201810995475ACN109063191BCN 109063191 BCN109063191 BCN 109063191BCN 201810995475 ACN201810995475 ACN 201810995475ACN 109063191 BCN109063191 BCN 109063191B
Authority
CN
China
Prior art keywords
query
optional
result
statement
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810995475.7A
Other languages
Chinese (zh)
Other versions
CN109063191A (en
Inventor
陈榕
王宁
陈海波
臧斌宇
管海兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiao Tong University
Original Assignee
Shanghai Jiao Tong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiao Tong UniversityfiledCriticalShanghai Jiao Tong University
Priority to CN201810995475.7ApriorityCriticalpatent/CN109063191B/en
Publication of CN109063191ApublicationCriticalpatent/CN109063191A/en
Application grantedgrantedCritical
Publication of CN109063191BpublicationCriticalpatent/CN109063191B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Landscapes

Abstract

Translated fromChinese

本发明提供了一种在RDF数据集上进行OPTIONAL查询的方法,将RDF数据集加载到内存,与客户端建立网络通信,接收含OPTIONAL查询的查询请求;解析查询请求,将查询请求中的查询语句分解成主查询和OPTIONAL查询两部分;执行主查询,产生主查询结果;执行OPTIONAL查询的分组查询,产生分组查询结果,汇总分组查询结果得到OPTIONAL查询结果,将主查询结果、OPTIONAL查询结果结合形成最终查询结果,将最终查询结果发送客户端。本发明提出的在RDF数据集上进行OPTIONAL查询的方法,通过建立OPTIONAL匹配表,标记OPTIONAL变量,调整查询语句顺序,动态修正查询结果等技术,极大地提高了OPTIONAL查询效率,显著降低了查询处理的开销,加快了查询速度。

Figure 201810995475

The invention provides a method for OPTIONAL query on an RDF data set, which loads the RDF data set into memory, establishes network communication with a client, receives a query request containing OPTIONAL query; parses the query request, and converts the query in the query request The statement is decomposed into two parts: the main query and the OPTIONAL query; the main query is executed to generate the main query results; the grouped query of the OPTIONAL query is executed to generate the grouped query results, the grouped query results are aggregated to obtain the OPTIONAL query results, and the main query results and the OPTIONAL query results are combined. The final query result is formed, and the final query result is sent to the client. The method for OPTIONAL query on RDF data set proposed by the present invention greatly improves OPTIONAL query efficiency and significantly reduces query processing by establishing OPTIONAL matching table, marking OPTIONAL variables, adjusting query statement sequence, dynamically modifying query results and other technologies. The overhead, speeding up the query speed.

Figure 201810995475

Description

Method and storage medium for performing OPTIONAL query on RDF dataset
Technical Field
The invention relates to the field of RDF graph query, in particular to a method for carrying out OPTIONAL query on an RDF data set.
Background
Human beings are in the big data age, and various types of mass data need to be stored and used in appropriate forms. Many of the knowledge-graph, social network data is typically stored in the form of RDF (Resource Description Framework). RDF is a standard model of data recommended by W3C (The World Wide Web Consortium) that can conveniently represent structural data on The World Wide Web. In the RDF model, the interrelated data may be represented as triples (triples). A triple is composed of a subject (subject), a predicate (predicate), and an object (object). For example, using the RDF model, we can conveniently store the data "shanghai transportation university located in shanghai city". Wherein "Shanghai transportation university" is the subject of the triple, "Shanghai City" is the predicate, and "Shanghai City" is the object. If the subject and the object are regarded as vertexes and the predicate is regarded as an edge, the triple can be represented as a directed graph connecting the vertex of the subject and the vertex of the object by the predicate. The entire RDF dataset can be viewed as an intricate, directed graph with vertices connected with various predicates.
SPARQL (simple Protocol and RDF Query language) is a language for querying RDF data. The SPARQL query consists of multiple query statements. The query statement is shaped like a triple in RDF, a subject, a predicate or an object in a triple is replaced with a variable to be queried. For a SPARQL statement, if a known element (one or more of a subject, predicate, or object) in the statement is consistent with a corresponding element in an RDF triple, the data is said to "match" the query statement. The data elements corresponding to the variables in the statement are the query results of the statement.
For the underlying SPARQL query, the relationships between statements are similar to joins (joins) in SQL. For example, statement 1 is executed to generate intermediate result 1, and statement 2 is executed based on each piece of data in intermediate result 1, and if the intermediate result does not match statement 2, the piece of data does not appear in intermediate result 2 after the end of the statement. In this specification, we will refer to the underlying SPARQL query as the "master query".
While for the OPTIONAL query, the relationship between statements is similar to the left outer join in SQL. For example, if statement 1 executes to generate intermediate result 1, and statement 2 executes based on each piece of data in intermediate result 1, and if the intermediate result does not match statement 2, the piece of data still appears in intermediate result 2 after the end of the statement, and only the variable required to be queried by statement 2, referred to as the option variable in this specification, will be replaced with a null value.
A SPARQL query with options typically consists of two parts: main queries and options queries. The main query may contain zero or more query statements, enclosed by curly brackets. Following the main query is one or more OPTIONAL query groups, each marked by an OPTIONAL keyword and bracketed around the query statement. Each query group contains zero or more query statements.
The traditional SPARQL query engine does not consider the characteristics of RDF graph structure data, which results in the need of final join (final join) to generate results, and the cost is very high. For basic SPARQL query, a query method for graph exploration exists, final connection is avoided, and efficiency is high. However, the graph exploration technique cannot be applied to the OPTIONAL query because the special nature of the "not matching" intermediate results is not removed by the OPTIONAL query. How to invent an efficient OPTIONAL query method becomes a technical problem in the current graph query field.
Disclosure of Invention
In view of the deficiencies in the prior art, it is an object of the present invention to provide a method for performing an OPTIONAL query on an RDF dataset.
According to the method for carrying out the OPTIONAL query on the RDF data set, provided by the invention, the method comprises the following steps: a request receiving step: loading the RDF data set into a memory, establishing network communication with a client, and receiving a query request containing an OPTIONAL query; analyzing the request: analyzing the query request, and decomposing a query statement in the query request into a main query part and an OPTIONAL query part; a main inquiry step: executing the main query to generate a main query result; optional query procedure: and executing the grouping query of the OPTIONAL query, generating an OPTIONAL grouping query result, summarizing the OPTIONAL grouping query result to obtain an OPTIONAL query result, combining the main query result and the OPTIONAL query result to form a final query result, and sending the final query result to the client.
Preferably, the OPTIONAL querying step comprises: a query grouping step: grouping the OPTIONAL queries according to a grammar specification, one OPTIONAL query being grouped into one or more OPTIONAL query groups; and (3) executing a query step: executing query on a single OPTIONAL query group to form a first query result, and collecting and summarizing the first query result to form a second query result after all the query is executed by a plurality of OPTIONAL query groups; a progress judgment step: judging whether all the OPTIONAL query groups are executed completely, if the not executed OPTIONAL query groups exist, returning to the step of executing the query and continuing to execute; and if all the query groups are executed, sending the second query result as a final query result to the client.
Preferably, the step of performing a query comprises: a pretreatment step: analyzing the OPTIONAL query statement and sequencing the OPTIONAL query statement; a task decomposition step: decomposing the query task to form one or more query subtasks, and sending the query subtasks to a plurality of working threads for execution; a thread query step: the working thread executes the query statements one by one in a graph exploration mode; and a result summarizing step: collecting query results of the working threads, obtaining summary results of the queries, and forming a first query result; and final result summarizing step: comparing the number of the working threads which are inquired with the total number of the working threads, judging whether the inquiry of all the working threads is finished or not, and if not, returning to the thread inquiry step for continuous execution; if the query is finished, summarizing the first query result to form a second query result.
Preferably, the pre-treating step comprises: and (3) variable analysis step: analyzing an OPTIONAL query statement and marking an OPTIONAL variable; establishing a matching table: establishing a matching table based on the main query result, wherein the line number of the matching table is equal to the line number of the main query result; a sorting step: and performing priority classification on the query sentences, wherein the query sentences with high priority are preferentially executed.
Preferably, the thread querying step includes: and (3) searching and screening: replacing a known variable in a query statement with first data corresponding to an intermediate result, querying the first data in an RDF data set, querying to obtain second data, if the second data is inconsistent with the first data, marking a corresponding unit of a matching table as unmatched, and if the second data is not inconsistent with the first data, marking a corresponding unit of the matching table as matched; interrupting subsequent query operations for statements marked as unmatched in units of the matching table; and (3) unknown variable query step: inquiring whether third data matching the statement exists in the RDF data set or not according to each query statement containing unknown variables and each intermediate result marked as matching in the matching table, and if so, updating the intermediate result of the statement by using the third data; if not, updating the mark of the strip in the matching table to be not matched; and a known variable query step: inquiring whether fourth data matching with the statement exists in the RDF data set or not according to the query statement containing the known variable and the intermediate result marked as matching in the matching table, and if so, keeping the matching mark of the statement in the matching table; if not, the tag of the strip in the match table is updated to a mismatch.
Preferably, the result summarizing step includes: collecting query results of the working threads, obtaining summary results of the queries, and forming a first query result; a working thread recording step: recording the number of the working threads which have returned to the query result, and recording as the queried number; recording the total number of the working threads of the sent tasks, and recording the total number as the total number of the threads; a counting updating step: if the working thread returns the query result, summarizing the query result into a summary result, and increasing the query quantity; a step of summarizing and judging: and if the queried number is equal to the total number of the threads, taking the summary result as a first query result.
According to the present invention, there is provided a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the above-described method.
Compared with the prior art, the invention has the following beneficial effects:
1. reordering the query sentences according to the main query result and the query sentences, so that the sentences with higher selectivity can be preferentially executed, unnecessary calculation amount is reduced, and the query speed is increased;
2. the OPTIONAL query task is decomposed into a plurality of subtasks and is executed by a plurality of working threads in parallel, so that the execution efficiency is improved, and the query speed is increased;
3. the method has the advantages that the 'matched' intermediate result is kept in the query process through the OPTIONAL matching table marking, the unmatched intermediate result is removed, the solving range of an OPTIONAL query group is recorded in a mode of marking an OPTIONAL variable, the query result is dynamically corrected in the query process of the OPTIONAL, the OPTIONAL query can be conducted in a graph exploration mode, a final join (final join) step in the prior art is avoided, and the query speed is greatly increased;
5. the problem that the prior art cannot efficiently execute the OPTIONAL query is solved, and reference is provided for the technical scheme of utilizing a graph exploration mode to perform other complex types of queries.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a graphical representation of an RDF data set of an embodiment;
FIG. 2 is a flow chart of an OPTIONAL query on an RDF dataset.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
According to the method for carrying out the OPTIONAL query on the RDF data set, provided by the invention, the method comprises the following steps: a request receiving step: loading the RDF data set into a memory, establishing network communication with a client, and receiving a query request containing an OPTIONAL query; analyzing the request: analyzing the query request, and decomposing a query statement in the query request into a main query part and an OPTIONAL query part; a main inquiry step: executing the main query to generate a main query result; optional query procedure: and executing the grouping query of the OPTIONAL query, generating an OPTIONAL grouping query result, summarizing the OPTIONAL grouping query result to obtain an OPTIONAL query result, combining the main query result and the OPTIONAL query result to form a final query result, and sending the final query result to the client.
Specifically, the option query step includes: a query grouping step: grouping the OPTIONAL queries according to a grammar specification, one OPTIONAL query being grouped into one or more OPTIONAL query groups; and (3) executing a query step: executing query on a single OPTIONAL query group to form a first query result, and collecting and summarizing the first query result to form a second query result after all the query is executed by a plurality of OPTIONAL query groups; a progress judgment step: judging whether all the OPTIONAL query groups are executed completely, if the not executed OPTIONAL query groups exist, returning to the step of executing the query and continuing to execute; and if all the query groups are executed, sending the second query result as a final query result to the client.
Specifically, the step of executing the query includes: a pretreatment step: analyzing the OPTIONAL query statement and sequencing the OPTIONAL query statement; a task decomposition step: decomposing the query task to form one or more query subtasks, and sending the query subtasks to a plurality of working threads for execution; a thread query step: the working thread executes the query statements one by one in a graph exploration mode; and a result summarizing step: collecting query results of the working threads, obtaining summary results of the queries, and forming a first query result; and final result summarizing step: comparing the number of the working threads which are inquired with the total number of the working threads, judging whether the inquiry of all the working threads is finished or not, and if not, returning to the step of executing the inquiry to continue the execution; if the query is finished, summarizing the first query result to form a second query result.
Specifically, the pretreatment step includes: and (3) variable analysis step: analyzing an OPTIONAL query statement and marking an OPTIONAL variable; establishing a matching table: establishing a matching table based on the main query result, wherein the line number of the matching table is equal to the line number of the main query result; a sorting step: and performing priority classification on the query sentences, wherein the query sentences with high priority are preferentially executed.
Specifically, the thread querying step includes: and (3) searching and screening: replacing a known variable in a query statement with first data corresponding to an intermediate result, querying the first data in an RDF data set, querying to obtain second data, if the second data is inconsistent with the first data, marking a corresponding unit of a matching table as unmatched, and if the second data is not inconsistent with the first data, marking a corresponding unit of the matching table as matched; interrupting subsequent query operations for statements marked as unmatched in units of the matching table; and (3) unknown variable query step: inquiring whether third data matching the statement exists in the RDF data set or not according to each query statement containing unknown variables and each intermediate result marked as matching in the matching table, and if so, updating the intermediate result of the statement by using the third data; if not, updating the mark of the strip in the matching table to be not matched; and a known variable query step: inquiring whether fourth data matching with the statement exists in the RDF data set or not according to the query statement containing the known variable and the intermediate result marked as matching in the matching table, and if so, keeping the matching mark of the statement in the matching table; if not, the tag of the strip in the match table is updated to a mismatch.
Specifically, the result summarizing step includes: collecting query results of the working threads, obtaining summary results of the queries, and forming a first query result; a working thread recording step: recording the number of the working threads which have returned to the query result, and recording as the queried number; recording the total number of the working threads of the sent tasks, and recording the total number as the total number of the threads; a counting updating step: if the working thread returns the query result, summarizing the query result into a summary result, and increasing the query quantity; a step of summarizing and judging: and if the queried number is equal to the total number of the threads, taking the summary result as a first query result.
According to the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method.
According to the method for carrying out OPTIONAL query on the RDF data set, provided by the invention, by the technologies of establishing an OPTIONAL matching table, marking OPTIONAL variables, adjusting the sequence of query statements, dynamically correcting query results and the like, the query efficiency is greatly improved, the query processing expense is obviously reduced, and the query speed is accelerated.
The present invention is explained in more detail below.
When a request is received, the RDF data set is loaded into the memory before the service is provided, and if the system comprises a server, the memory of the server can completely contain the RDF data set. If a plurality of servers are included, each server can load a complete data set, or the data set can be divided, and each server loads a part of the data set. The slicing of the dataset may be uneven, allowing some servers to load more data and others to load less data. The servers and clients in the system should be able to establish network communications based on various types of network communication techniques, including but not limited to TCP and RDMA. The client sends the query request to the server through the network, and the query result is returned to the client through the network.
When parsing a query request, the SPARQL syntax specification should be met. The main query portion may contain 0 to more query statements and the OPTIONAL query may contain a plurality of OPTIONAL query groups, each of which may contain 0 to more query statements. After the server analyzes the query request, the whole query is placed in a query structure, and the main query statement and each OPTIONAL query group are respectively stored in the structure. The OPTIONAL query set is stored in a vector (vector) in groups, and is fetched from the vector one at a time for execution. The query structure contains the id of the query to distinguish it from other queries. And recording the current execution progress in the query structure body to determine the subsequent steps.
When executing the main query, the main query may be implemented in various ways, including but not limited to final join, graph explorer. The main query is typically performed using graph exploration to provide better query performance. The main query result is stored in a vector (result _ table), and the number of rows (n _ rows) and the number of columns (n _ cols) of the query result are recorded in the query structure. The query structure records the column of the query result and the mapping (variable _ to _ column _ map) of the variable. Locating the position of a result in the vector by: the column (col) corresponding to the variable X in the query result is variable _ to _ column _ map [ X ]; the kth result value of the variable X is result table [ (k-1) × n _ cols + col ].
The server parses the options query portion, flagging the options variable. The specific process of the analysis is as follows: the query statement is scanned bar by bar, and for each element of each query statement (triplet), it is first checked whether it is a variable. If so, a lookup is made in the intermediate result whether the variable is a known variable. If the variable is unknown, then see if it has been marked. If not, it is marked as an OPTIONAL variable. The tags are stored in a set (set). The server establishes an OPTIONAL matching table based on the main query result. The OPTIONAL matching table is stored in a vector data structure (vector), and the arrangement of the matching table is equal to the number of lines of the main query result and corresponds to each line of the main query result one by one. At initialization, the value of each row in the match table is set to true (true). The server reorders the OPTIONAL statements and optimizes the query. When reordering, the OPTIONAL statements are divided into three priorities, wherein the high priority is executed preferentially, the medium priority is executed after the high priority, and the low priority is executed finally. The priority is generated according to the following principle: high priority: the subject and the object in the query statement are constant and known variables (index to below, below to const, const to below); medium priority: in the query statement, the subject is a constant or a known variable, and the object is an unknown variable (index to unknown, neighbor to unknown); low priority: the query statement is mainly unknown variable (unknown to unknown, unknown to const)
The respective work threads may be distributed on the same machine or may be distributed on a plurality of machines. If the working thread is on multiple machines, the multiple machines can all load complete RDF data, or the RDF data can be divided, and each machine stores a part of RDF data. If the working thread is on a plurality of machines, and each machine stores partial RDF data, the server divides the task by considering the distribution condition of the data. In particular, all the data required for a task assigned to a machine should be found on that machine.
When the thread queries, pruning is performed by using the OPTIONAL matching table in the query process, and the subsequent statements are not calculated for the intermediate results marked as 'unmatched' in the matching table. Specifically, the worker thread replaces the known variable of the current query statement triplet with the corresponding data in an intermediate result (marked as true in the match table) and then uses it to find the result in the dataset. If the found result is inconsistent with the intermediate result, the mismatching occurs, and the corresponding row in the matching table is marked as false (false); for query statements that involve unknown OPTIONAL variables, the RDF graph data is looked up for the presence of data that matches the statement based on the intermediate results of each item labeled "match" in the match table. If yes, updating the intermediate result; if not, marking the result as not matched; for query statements that refer to known variables, the RDF graph data is looked up for the presence of data that matches the current statement based on the intermediate results of each entry marked as a "match" in the match table. If yes, the intermediate result accords with the query condition, and the matching state is kept; if not, the intermediate result violates the statement, marks the intermediate result as 'not matched', and corrects the OPTIONAL variable of the intermediate result to be null. In particular, it will be checked whether each column of the intermediate result belongs to the OPTIONAL variable, and if so, the value of the column is modified to null (BLANK).
After the thread query is finished, the server records the number of the working threads which have returned results at present and the number of the working threads of the total scheduled tasks; when a worker thread returns a query result, adding the result into the summary result, and increasing the number count of the worker threads which have returned the result; when the number of work threads that have returned results is equal to the number of work threads for the total scheduled tasks, an aggregated result is declared to have been generated.
After an option query group is executed, the progress in the query structure is increased (OPTIONAL _ step + ═ 1). By comparing the number of progress (OPTIONAL _ step) and option query groups, it can be determined whether all queries have been completed. If not, the next unexecuted OPTIONAL query set is taken out from the vector, and the unexecuted OPTIONAL query set is continuously queried. If all the data is finished, the summary result is returned to the client side in a network mode.
In summary, the method for performing the optimal query on the RDF dataset according to the present invention greatly improves the optimal query efficiency, significantly reduces the query processing overhead, and accelerates the query speed by establishing the optimal matching table, marking the optimal variable, adjusting the query statement sequence, dynamically correcting the query result, and other techniques.
Specific examples of queries are listed below. As shown in Table one, the main query portion of the statement requires querying all entities in the RDF dataset that are of the identity type "software engineer" and querying the entities of their native school. The first OPTIONAL query team asked to query the entities of his Master school and asked that the Hospital school is a double university and that the address of the Master school was in Shanghai. The second option query group asks for entities in his doctor's school. The RDF data set is shown in FIG. 1, and FIG. 1 is converted into a table form, shown in Table two.
Table one query SPARQL description
Figure BDA0001781752700000081
Tabular form of the Table two RDF dataset
Figure BDA0001781752700000091
Table three final query results
?X?B?M?D
Engineer armorFUDAN UniversityShanghai Jiaotong University-
Engineer BFUDAN University-FUDAN University
Engineer CFUDAN University--
Engineer DShanghai Jiaotong University-Peking University
Engineer WuShanghai Jiaotong University--
As shown in table three, after the main query is executed, we will get the main query result, two columns of five rows (engineer a-e) (. Then execute the first OPTIONAL query set, it can be seen that Eethanbutyng all has data for Master school, however the second statement in the query set limits that the subject school should be a double university, so the two lines of Butyleng results are marked as not matching and their OPTIONAL variable is dynamically modified to null. The third statement requires that the address of the master school should be Shanghai, so the line with B is marked as a mismatch and its OPTIONAL variable is dynamically modified to null. The intermediate result becomes three columns of five rows after execution of the OPTIONAL query set, and the third column has only the value of A. Similarly, the results of the second OPTIONAL query group may be obtained, and the final results obtained.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (4)

Translated fromChinese
1.一种在RDF数据集上进行OPTIONAL查询的方法,其特征在于,包括:1. a method for carrying out OPTIONAL query on RDF data set, is characterized in that, comprises:接收请求步骤:将RDF数据集加载到内存,与客户端建立网络通信,接收含OPTIONAL查询的查询请求;Steps of receiving the request: load the RDF dataset into the memory, establish network communication with the client, and receive the query request containing the OPTIONAL query;解析请求步骤:解析查询请求,将查询请求中的查询语句分解成主查询和OPTIONAL查询两部分;Parsing request steps: Parse the query request, and decompose the query statement in the query request into two parts: the main query and the OPTIONAL query;主查询步骤:执行主查询,产生主查询结果;Main query step: execute the main query and generate the main query result;OPTIONAL查询步骤:执行OPTIONAL查询的分组查询,产生OPTIONAL分组查询结果,汇总OPTIONAL分组查询结果得到OPTIONAL查询结果,将主查询结果、OPTIONAL查询结果结合形成最终查询结果,将最终查询结果发送客户端;OPTIONAL query steps: Execute the grouped query of the OPTIONAL query, generate the OPTIONAL grouped query result, aggregate the OPTIONAL grouped query results to obtain the OPTIONAL query result, combine the main query result and the OPTIONAL query result to form the final query result, and send the final query result to the client;还包括执行查询步骤包括:Also included are performing query steps including:预处理步骤:解析OPTIONAL查询语句,对OPTIONAL查询语句进行排序;Preprocessing step: parse the OPTIONAL query statement and sort the OPTIONAL query statement;任务分解步骤:分解查询任务,形成一个或多个查询子任务,将查询子任务发送给多个工作线程执行;Task decomposition steps: decompose the query task, form one or more query subtasks, and send the query subtasks to multiple worker threads for execution;线程查询步骤:工作线程以图探索的方式逐条执行查询语句;Thread query step: Worker threads execute query statements one by one in the way of graph exploration;结果汇总步骤:收集工作线程的查询结果,获得查询的汇总结果,形成第一查询结果;The result summarizing step: collecting the query results of the worker threads, obtaining the summary results of the query, and forming the first query result;最终结果汇总步骤:将已完成查询的工作线程的数量与工作线程的总数量进行对比,判断是否完成全部工作线程的查询,若未完成,则返回线程查询步骤继续执行;若完成,则汇总第一查询结果,形成第二查询结果;Final result summary step: Compare the number of worker threads that have completed the query with the total number of worker threads to determine whether the query of all worker threads is completed. If not, return to the thread query step to continue execution; if completed, summarize the first a query result to form a second query result;所述预处理步骤包括:The preprocessing steps include:变量解析步骤:解析OPTIONAL查询语句,标记OPTIONAL变量;Variable parsing steps: Parse the OPTIONAL query statement and mark the OPTIONAL variable;建立匹配表步骤:基于主查询结果,建立匹配表,所述匹配表的行数等于主查询结果的行数;The step of establishing a matching table: based on the main query result, a matching table is established, and the number of rows in the matching table is equal to the number of rows in the main query result;排序步骤:对查询语句进行优先等级划分,优先等级高的查询语句优先执行;Sorting step: prioritize query statements, and query statements with higher priority levels are executed first;所述线程查询步骤包括:The thread query step includes:搜索筛选步骤:将查询语句中的已知变量替换为一条中间结果对应的第一数据,将所述第一数据在RDF数据集中查询,查询得到的第二数据,若第二数据与第一数据矛盾,则对匹配表的对应单元标记为不匹配,若第二数据与第一数据不矛盾,则对匹配表的对应单元进行标记为匹配;对匹配表的单元中标记为不匹配的语句中断后续的查询操作;Search and screening step: replace the known variable in the query statement with the first data corresponding to an intermediate result, query the first data in the RDF data set, and query the obtained second data, if the second data is the same as the first data If there is a contradiction, the corresponding unit of the matching table is marked as not matching; if the second data is not inconsistent with the first data, the corresponding unit of the matching table is marked as matching; the statement marked as mismatching in the unit of the matching table is interrupted subsequent query operations;未知变量查询步骤:对包含未知变量的查询语句,基于每一条在匹配表中标记为匹配的中间结果,在RDF数据集中查询是否存在匹配本条语句的第三数据,若存在,则用第三数据更新该条语句的中间结果;若不存在,则将该条在匹配表中的标记更新为不匹配;Unknown variable query step: For query statements containing unknown variables, based on each intermediate result marked as matching in the matching table, query whether there is a third data matching this statement in the RDF data set, if so, use the third data Update the intermediate result of the statement; if it does not exist, update the mark in the matching table to not match;已知变量查询步骤:对包含已知变量的查询语句,基于每一条在匹配表中标记为匹配的中间结果,在RDF数据集中查询是否存在匹配本条语句的第四数据,若存在,则保持该条在匹配表中的匹配标记;若不存在,则将该条在匹配表中的标记更新为不匹配。Known variable query step: For query statements containing known variables, based on each intermediate result marked as matching in the matching table, query whether there is a fourth data matching this statement in the RDF data set, and if so, keep this value. The matching token of the entry in the match table; if it does not exist, the token in the match table of the entry is updated to not match.2.根据权利要求1所述的在RDF数据集上进行OPTIONAL查询的方法,其特征在于,所述OPTIONAL查询步骤包括:2. the method for carrying out OPTIONAL query on RDF data set according to claim 1, is characterized in that, described OPTIONAL query step comprises:查询分组步骤:对OPTIONAL查询按照语法规范进行分组,一个OPTIONAL查询分为一个或多个OPTIONAL查询组;Query grouping steps: OPTIONAL queries are grouped according to syntax specifications, and an OPTIONAL query is divided into one or more OPTIONAL query groups;执行查询步骤:对单个OPTIONAL查询组执行查询,形成第一查询结果,当若干个OPTIONAL查询组全部执行查询后,收集汇总第一查询结果,形成第二查询结果;Execute the query step: execute a query on a single OPTIONAL query group to form a first query result, and after all of the several OPTIONAL query groups execute the query, collect and summarize the first query results to form a second query result;进度判断步骤:判断是否执行完所有OPTIONAL查询组,若存在未执行的OPTIONAL查询组,则返回执行查询步骤继续执行;若已经执行完所有查询组,则将第二查询结果作为最终查询结果发送客户端。Progress judgment step: Judging whether all OPTIONAL query groups have been executed, if there is an unexecuted OPTIONAL query group, return to the execution query step to continue execution; if all query groups have been executed, send the second query result as the final query result to the client end.3.根据权利要求1所述的在RDF数据集上进行OPTIONAL查询的方法,其特征在于,所述结果汇总步骤包括:收集工作线程的查询结果,获得查询的汇总结果,形成第一查询结果;3. the method for carrying out OPTIONAL query on RDF data set according to claim 1, is characterized in that, described result aggregating step comprises: collect the query result of worker thread, obtain the summary result of query, form the first query result;工作线程记录步骤:记录已经返回查询结果的工作线程的数量,记为已查询数量;记录被发送任务的工作线程的总数量,记为线程总数量;Worker thread recording step: record the number of worker threads that have returned query results, and record as the number of queries; record the total number of worker threads that have been sent tasks, record it as the total number of threads;计数更新步骤:若工作线程返回查询结果,则将所述查询结果汇总到汇总结果中,增加已查询数量;Count update step: if the worker thread returns the query result, the query result is aggregated into the aggregated result to increase the number of queries;汇总判断步骤:若已查询数量与线程总数量相等,则将汇总结果作为第一查询结果。Aggregation judgment step: If the number of queries is equal to the total number of threads, the aggregated result is taken as the first query result.4.一种存储有计算机程序的计算机可读存储介质,其特征在于,所述计算机程序被处理器执行时实现权利要求1至3中任一项所述的方法的步骤。4. A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 3 are implemented.
CN201810995475.7A2018-08-292018-08-29 Method and storage medium for OPTIONAL query on RDF datasetActiveCN109063191B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201810995475.7ACN109063191B (en)2018-08-292018-08-29 Method and storage medium for OPTIONAL query on RDF dataset

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201810995475.7ACN109063191B (en)2018-08-292018-08-29 Method and storage medium for OPTIONAL query on RDF dataset

Publications (2)

Publication NumberPublication Date
CN109063191A CN109063191A (en)2018-12-21
CN109063191Btrue CN109063191B (en)2021-07-06

Family

ID=64756509

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201810995475.7AActiveCN109063191B (en)2018-08-292018-08-29 Method and storage medium for OPTIONAL query on RDF dataset

Country Status (1)

CountryLink
CN (1)CN109063191B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1647070A (en)*2001-06-222005-07-27诺萨·欧莫贵System and method for knowledge retrieval, management, delivery and presentation
CN101593180A (en)*2008-05-302009-12-02国际商业机器公司The SPARQL inquiry is changed into the method and apparatus of SQL query
CN102609445A (en)*2010-12-102012-07-25微软公司Matching queries to data operations using query templates
CN103116625A (en)*2013-01-312013-05-22重庆大学Volume radio direction finde (RDF) data distribution type query processing method based on Hadoop
CN104809168A (en)*2015-04-062015-07-29华中科技大学Partitioning and parallel distribution processing method of super-large scale RDF graph data
CN106599091A (en)*2016-11-242017-04-26上海交通大学Storage and indexing method of RDF graph structures stored based on key values
WO2017122904A1 (en)*2016-01-112017-07-20한국과학기술원Open information extraction method and system for extracting reified ternary relationship

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
AUPR796701A0 (en)*2001-09-272001-10-25Plugged In Communications Pty LtdDatabase query system and method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1647070A (en)*2001-06-222005-07-27诺萨·欧莫贵System and method for knowledge retrieval, management, delivery and presentation
CN101593180A (en)*2008-05-302009-12-02国际商业机器公司The SPARQL inquiry is changed into the method and apparatus of SQL query
CN102609445A (en)*2010-12-102012-07-25微软公司Matching queries to data operations using query templates
CN103116625A (en)*2013-01-312013-05-22重庆大学Volume radio direction finde (RDF) data distribution type query processing method based on Hadoop
CN104809168A (en)*2015-04-062015-07-29华中科技大学Partitioning and parallel distribution processing method of super-large scale RDF graph data
WO2017122904A1 (en)*2016-01-112017-07-20한국과학기술원Open information extraction method and system for extracting reified ternary relationship
CN106599091A (en)*2016-11-242017-04-26上海交通大学Storage and indexing method of RDF graph structures stored based on key values

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于RDF(S)资源查询的实体关系模型构建";高相斌;《中国优秀硕士学位论文全文数据库 信息科技辑》;20171015;全文*

Also Published As

Publication numberPublication date
CN109063191A (en)2018-12-21

Similar Documents

PublicationPublication DateTitle
Kim et al.Taming subgraph isomorphism for RDF query processing
US10515118B2 (en)Processing a data flow graph of a hybrid flow
CN103678520B (en)A kind of multi-dimensional interval query method and its system based on cloud computing
US9223828B2 (en)Data flow graph optimization using adaptive rule chaining
US10769147B2 (en)Batch data query method and apparatus
CN105550225B (en)Index structuring method, querying method and device
US9870382B2 (en)Data encoding and corresponding data structure
US20190317938A1 (en)Method, program, and system for automatic discovery of relationship between fields in environment where different types of data sources coexist
CN103049521B (en)Virtual table directory system and the method for many attributes multiple condition searching can be realized
WO2015126343A1 (en)Apparatus and method for processing distributed relational algebra operators in a distributed database
US9594804B2 (en)Dynamic reordering of operations in a query plan
US9824121B2 (en)Aggregate data streams in relational database systems
CN111125199B (en)Database access method and device and electronic equipment
CN105701128B (en) A query statement optimization method and device
CN105550332A (en)Dual-layer index structure based origin graph query method
JP2016529586A (en) System and method for tuning multi-store systems and accelerating big data query workloads
GB2609831A (en)Multi-value primary keys for plurality of unique identifiers of entities
CN114116785A (en) A Distributed SPARQL Query Optimization Method Based on Minimum Attribute Cut
CN103605750B (en)A kind of quick distributed data paging method
CN114297260B (en) Distributed RDF data query method, device and computer equipment
CN109063191B (en) Method and storage medium for OPTIONAL query on RDF dataset
US8862606B1 (en)Executing correlated and multi-row subqueries in a MPP database
US11386155B2 (en)Filter evaluation in a database system
CN107169138A (en)A kind of data distributing method of Based on Distributed memory database query engine
GB2596769A (en)Extensible data skipping

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp