Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments are not representative of all implementations consistent with one or more embodiments of the present application. Rather, they are merely examples consistent with aspects of one or more embodiments of the present application.
It should be noted that in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than described herein. Furthermore, individual steps described in this application may be described as being broken down into multiple steps in other embodiments; while various steps described in this application may be combined into a single step in other embodiments.
Database sharding is a technique of partitioning and storing data in a database on multiple nodes, which aims to solve the problem that a single database cannot handle large-scale data and high concurrent access. In a database Shard, data is partitioned into a plurality of data shards (shards), each data Shard containing a portion of the data; each data slice may be stored on a different node. Typically, each data slice may be processed independently.
The write transaction (Write Transaction) refers to the process of performing a modify operation in a database. Write transactions may include insert, update, delete, etc. operations that actually modify the data. Write transactions need to satisfy characteristics such as atomicity, consistency, isolation, durability, etc., ensuring the correctness and integrity of the data. In a write transaction, the database system will treat all write operations as one indivisible unit, either all succeeded or all failed, to ensure data consistency.
Commit of a transaction refers to the process of permanently saving all database operations in one transaction to the database. During execution of a transaction, all operations are performed only in the temporary storage area, and the results of the operations are not written into the permanent storage area of the database until the transaction is committed. When a transaction execution is completed and validated, it may be selected for commit, meaning that the modifications made by the transaction will be permanently saved to the database and made visible to other users. After the transaction is committed, the state of the database will be updated to reflect the changes made by the transaction. The commit of a transaction is an important step to ensure the consistency and integrity of the database, which ensures that a set of related operations either all execute successfully or rollback completely, thereby avoiding inconsistencies and corruption of data.
Write transactions for data stored in a database may be split into sub-transactions for individual data slices of the data stored in the database. When all sub-transactions split from a write transaction are executed, the execution of the write transaction is considered complete, and after verification, the write transaction can be committed.
In order to ensure the correctness and isolation of data, for a sub-transaction for a data partition split from a write transaction, the data corresponding to the sub-transaction in the data partition needs to be locked first, the sub-transaction holds a lock, and then after the write transaction is submitted, the data is unlocked, i.e. the lock is released by the sub-transaction.
When multiple write transactions are concurrent, deadlock may occur if sub-transactions split from the write transactions cannot properly and orderly hold and release locks. In this case, the flow of the write transaction may be blocked, resulting in the system not continuing to function properly.
The technical scheme of deadlock detection is that a detection client running on each partition management node can detect sub-transactions with lock waiting time reaching a deadlock detection threshold in a lock waiting queue, the generated lock waiting relationship corresponding to the sub-transactions is sent to a detection server corresponding to the detection client, the detection server updates a global lock waiting relationship based on the lock waiting relationship, deadlock detection is carried out on the sub-transactions based on the updated global lock waiting relationship, and if the sub-transactions are detected to trigger deadlock, the detection client can delete the sub-transactions from the lock waiting queue.
By adopting the mode, in the process of realizing deadlock detection and avoiding the flow of the write transaction from being blocked, the deadlock detection can be carried out only on the sub-transactions with the lock waiting time reaching the deadlock detection threshold, so that a part of unnecessary deadlock detection can be filtered. In addition, the deadlock detection threshold can be adjusted according to the actual deadlock detection requirement, so that the sensitivity of the deadlock detection is controlled.
The present application also provides another technical solution for deadlock detection, where a detection program running on each partition management node may detect a sub-transaction in a lock waiting queue, where a lock waiting duration reaches a deadlock detection threshold, and based on a generated lock waiting relationship corresponding to the sub-transaction and a lock waiting relationship synchronized from other partition management nodes, generate a global lock waiting relationship, and perform deadlock detection on the sub-transaction based on the global lock waiting relationship, and if a detection program detects that the sub-transaction triggers a deadlock, the detection program may delete the sub-transaction from the lock waiting queue.
By adopting the mode, on one hand, in the process of realizing deadlock detection and avoiding the flow of the writing transaction from being blocked, the deadlock detection can be carried out only on the sub-transaction with the lock waiting time reaching the deadlock detection threshold, so that a part of unnecessary deadlock detection can be filtered. On the other hand, the deadlock detection threshold can be adjusted according to the actual deadlock detection requirement, so that the sensitivity of the deadlock detection is controlled to adapt to different deadlock detection environments. For example, if the frequency of deadlock occurrences is high, the deadlock detection threshold may be lowered to discover deadlock faster. On the other hand, by adopting a non-central deadlock detection mode, the detection programs running on the partition management nodes execute deadlock detection, so that the problem that the deadlock detection is not timely due to the fact that a central detection server is blocked can be avoided.
Referring to fig. 1 and 2, fig. 1 is a schematic diagram of one database system according to an exemplary embodiment of the present application, and fig. 2 is a schematic diagram of another database system according to an exemplary embodiment of the present application.
As shown in fig. 1, in a database system, a database may be carried by one physical host. Wherein several virtual hosts (e.g., host 0, host 1, host N) can be created on the physical Host, and a segment management node can be created based on one virtual Host, for example: a segment management node may be created using the resources of a virtual host, where the segment management node may be a module created on the virtual host for managing data segments; the data stored in the database may be partitioned into data fragments (e.g., shard 0, shard 1, shard N); each data slice may be stored on a different slice management node, which is managed by one slice management node storing one data slice.
As shown in FIG. 2, in a database system, the database may be carried by a Host cluster, which may be made up of several physical hosts (e.g., host 0, host 1, host N). Wherein a fragment management node may be created based on a physical host, for example: a shard management node may be created using the resources of a physical host, where the shard management node may be a module created on the physical host for managing data shards; the data stored in the database may be partitioned into data fragments (e.g., shard 0, shard 1, shard N); each data slice may be stored on a different slice management node, which is managed by one slice management node storing one data slice.
In some embodiments, the database may be a graph database. The graph database is used to store and process data of a graph structure, and represents the data as a combination of nodes and edges, the nodes representing entities, and the edges representing relationships between the entities. In the graph database, each data slice may contain a portion of the graph's nodes and edges, as well as their associated attributes and labels, etc. Thus, in graph databases, it is often the case that one write transaction involves multiple data slices.
In some embodiments, the database may be a distributed database. In this case, a physical host as shown in fig. 1 or a host cluster as shown in fig. 2 may be used as a node device in the cluster of devices carrying the distributed database. The node device may specifically be a master device elected from the node devices comprised by the device cluster.
For example, the database may be a distributed database based on a Raft algorithm. The Raft algorithm is a consensus algorithm for achieving consistency in a distributed system, ensuring data consistency and high availability among multiple nodes. The Raft algorithm divides nodes in a distributed system into a leader (leader), a follower (follower), and a candidate (candidate). Under normal circumstances, one node is elected as the leader, responsible for receiving and processing requests from clients, and copying logs to other nodes. When the leader fails or cannot work normally, the rest nodes start the election process to elect a new leader. In this case, a physical host as shown in fig. 1 or a host cluster as shown in fig. 2 may be used as a master device (i.e., leader) selected from the node devices included in the cluster of devices carrying the distributed database by the Raft algorithm.
Referring to fig. 3 in conjunction with fig. 1 and 2, fig. 3 is a flowchart illustrating a deadlock detection method according to an exemplary embodiment of the present application.
In this embodiment, a physical host as shown in fig. 1 or a host cluster as shown in fig. 2 may be referred to as a target electronic device. That is, the target electronic device may be configured to carry a database, and the target electronic device may include a number of shard management nodes configured to manage data shards of data stored in the database.
A write transaction for data stored in the database may be split into sub-transactions for each data fragment of the data stored in the database. In this case, one shard management node may execute a sub-transaction for the data shard corresponding to this shard management node (i.e., the data shard managed by this shard management node) split from the write transaction for the data stored in the database.
Referring to fig. 4, fig. 4 is a schematic diagram of a target electronic device according to an exemplary embodiment of the present application.
As shown in fig. 4, for the plurality of fragment management nodes included in the target electronic device, the fragment management nodes may respectively run a detection client, and the target electronic device may also run a detection server corresponding to the detection client.
In some embodiments, the detection server may operate on a fragment management node specified in the plurality of fragment management nodes.
Fig. 4 shows a case where the detection server runs on a designated fragment management node. In other cases, the detection server may run on the target electronic device independent of the plurality of fragment management nodes, for example: the detection server may be another module created on the target electronic device independent of the several fragment management nodes for collecting the information sent by the detection client and performing deadlock detection accordingly.
In some embodiments, the specified fragment management node may be a fragment management node for managing a first data fragment (e.g., suard 0) stored in the database.
The deadlock detection method described above may be applied to a detection client running on any one of the above-described several fragment management nodes (which may be referred to as a target fragment management node). The dead lock detection method may include:
step 302: detecting whether the lock waiting time length of a first sub-transaction in a lock waiting queue maintained locally reaches a preset deadlock detection threshold; the first sub-transaction is a sub-transaction which is split from a write transaction for data stored in the database and is for data fragments corresponding to the target fragment management node.
In this embodiment, the target fragment management node may maintain a lock wait queue locally. The sub-transaction in the lock waiting queue is split from the write transaction for the data stored in the database, and is the sub-transaction for the data fragment corresponding to the target fragment management node. When a plurality of sub-transactions simultaneously request a lock of a certain data (e.g., a node, an edge, etc. of a graph) in a data partition corresponding to the target partition management node, if the lock is already occupied, the sub-transaction requesting the data is placed in a lock waiting queue corresponding to the lock until the lock is available.
The detection client running on the target fragment management node can detect whether the lock waiting time length of a certain sub-transaction (which can be called as a first sub-transaction) in the lock waiting queue reaches a preset deadlock detection threshold; the lock wait duration refers to the length of time that a lock is waiting in the lock wait queue. The dead lock detection threshold value can be specifically set by a user according to actual deadlock detection requirements, or can be a default value of a system default, which is not limited in the application.
Step 304: and if the lock waiting time length reaches the deadlock detection threshold, generating a lock waiting relation corresponding to the first sub-transaction, sending the lock waiting relation to the detection server, enabling the detection server to respond to the received lock waiting relation, updating a global lock waiting relation maintained locally based on the lock waiting relation, carrying out deadlock detection on the first sub-transaction based on the global lock waiting relation, and returning a deadlock detection result.
In this embodiment, when the detecting client running on the target partition management node detects that the lock waiting duration of the first sub-transaction reaches the deadlock detection threshold, a lock waiting relationship corresponding to the first sub-transaction may be generated; the lock wait relationship describes a lock wait condition between the first sub-transaction and the other sub-transactions.
For example, assuming that sub-transaction A currently holds lock 1, then if sub-transaction B needs lock 1, then it is necessary to wait for sub-transaction A to release lock 1, so the lock wait relationship for sub-transaction B to wait for sub-transaction A may be expressed as sub-transaction B→sub-transaction A.
Subsequently, the detection client running on the target partition management node may send a lock waiting relationship corresponding to the first sub-transaction to the detection server, so as to request the detection server to perform deadlock detection based on the lock waiting relationship. Specifically, the detection client may generate a detection request including the lock wait relationship, and send the detection request to the detection server.
The detection server can maintain the global lock waiting relationship locally. The global lock waiting relationship may be formed by integrating lock waiting relationships sent by detection clients running on each fragment management node by the detection server.
When the detection server receives the lock waiting relationship corresponding to the first sub-transaction, the detection server may respond to the lock waiting relationship, update the global lock waiting relationship based on the lock waiting relationship, and perform deadlock detection on the first sub-transaction based on the updated global lock waiting relationship. Subsequently, the detection server may return the deadlock detection result for the first sub-transaction to the detection client running on the target fragment management node.
In practical applications, deadlock detection is performed on the first sub-transaction based on the global lock waiting relationship, specifically, whether a cyclic lock waiting relationship using the first sub-transaction as a starting point and the first sub-transaction as an end point exists or not may be detected based on the global lock waiting relationship.
For example, sub-transaction B→sub-transaction A→sub-transaction B is a cyclic lock wait relationship starting at sub-transaction B and ending at sub-transaction B.
In some embodiments, the lock wait relationship may include an association between a transaction identification of the first sub-transaction and a transaction identification (e.g., a transaction number, etc.) of the sub-transaction holding the lock (which may be referred to as a second sub-transaction).
Accordingly, the global lock latency relationship may include a directed graph constructed based on lock latency relationships sent by detection clients running on respective fragment management nodes. Wherein, the vertex in the directed graph is the transaction identifier of the sub-transaction, the direction of the directed edge in the directed graph is the transaction identifier of the sub-transaction waiting for the lock, and the transaction identifier of the sub-transaction pointing to the holding lock.
In an actual application, the lock waiting relationship may further include a partition identifier of a data partition to which the first sub-transaction belongs, so as to divide the global lock waiting relationship according to the data partition.
Referring to fig. 5, fig. 5 is a schematic diagram illustrating a global lock wait relationship according to an exemplary embodiment of the present application.
As shown in fig. 5, in the directed graph as the global lock wait relationship, vertex a represents sub-transaction a, vertex B represents sub-transaction B, and vertex C represents sub-transaction C; the direction of the directed edge between the vertex C and the vertex B points to the vertex B from the vertex C, which means that the sub-transaction C waits for the sub-transaction B, and the direction of the directed edge between the vertex B and the vertex A points to the vertex A from the vertex B, which means that the sub-transaction B waits for the sub-transaction A.
When deadlock detection is performed on the first sub-transaction based on the global lock waiting relationship, loop detection may be performed on a directed graph serving as the global lock waiting relationship, so as to detect whether a loop with a transaction identifier of the first sub-transaction as a starting point (that is, with the first sub-transaction as an end point) exists in the directed graph. If such a loop exists in the directed graph, the first sub-transaction may be determined to trigger a deadlock.
By taking the directed graph constructed based on the lock waiting relationship from each partition management node as the global lock waiting relationship, deadlock detection can be carried out on one sub-transaction based on the global lock waiting relationship, and the detection can be converted into detection of whether a loop taking the sub-transaction as a starting point exists in the directed graph, so that the complexity of deadlock detection can be reduced, and the efficiency of deadlock detection can be improved.
In some embodiments, when loop detection is performed on the directed graph to determine whether a loop with the transaction identifier of the First sub-transaction as the starting point exists in the directed graph, the transaction identifier of the First sub-transaction may be specifically used as a traversing starting point based on a Depth-First traversing (Depth-First) algorithm, and the directed graph may be traversed to detect whether a loop with the transaction identifier of the First sub-transaction as the starting point exists in the directed graph.
Depth-first traversal is an algorithm for traversing or searching a tree or graph, the core idea of which is to start from a starting node, visit as deep as possible along a path, go back until reaching the deepest point, and then explore other non-visited branches. Depth-first traversal is characterized by a preference for depth, exploring as deep as possible along each path until no further depth can be continued. This makes the algorithm suitable for solving problems such as connectivity of the graph, path finding, etc.
In some embodiments, after deadlock detection is performed on the first sub-transaction based on the global lock waiting relationship, if it is determined that the first sub-transaction triggers a deadlock, the detection server may delete the lock waiting relationship corresponding to the first sub-transaction from the global lock waiting relationship, so as to avoid that the first sub-transaction triggers a deadlock from being repeatedly detected in the following process, which results in abnormal deadlock detection.
Step 306: and in response to the received deadlock detection result, deleting the first sub-transaction from the lock wait queue when the first sub-transaction is determined to trigger a deadlock based on the deadlock detection result.
In this embodiment, when the detection client running on the target partition management node receives the deadlock detection result for the first sub-transaction returned by the detection server, the detection client may determine, in response to the deadlock detection result, whether the first sub-transaction triggers a deadlock based on the deadlock detection result. If the first sub-transaction triggers a deadlock, the flow of the write transaction containing the first sub-transaction is blocked, and the first sub-transaction can be deleted from the lock waiting queue, so that the blocking problem is solved, and the subsequent normal operation of the system is ensured.
It should be noted that, the specific thread placed in the lock waiting queue may be a thread for executing a sub-transaction. If a sub-transaction triggers a deadlock, the thread for executing the sub-transaction may be interrupted, i.e., the sub-transaction may be considered to be deleted from the lock wait queue.
In some embodiments, when the detecting client running on the target partition management node determines that the first sub-transaction triggers a deadlock based on the deadlock detection result, in addition to deleting the first sub-transaction from the lock waiting queue, the detecting client may output prompt information of the first sub-transaction triggering deadlock to prompt the first sub-transaction triggering deadlock, so as to prompt retry of a write transaction including the first sub-transaction, or check and modify the write transaction including the first sub-transaction.
In some embodiments, if the first sub-transaction trigger deadlock described above is not detected, the first sub-transaction may be executed when a lock is available. The detection client running on the target fragment management node may respond to the completion of the execution of the first sub-transaction, and send a deletion request corresponding to the lock waiting relationship corresponding to the first sub-transaction to the detection server. The detection server may respond to the deletion request to delete the lock waiting relationship from the global lock waiting relationship. Therefore, the occurrence of abnormal deadlock detection caused by detecting that the first sub-transaction triggers the deadlock after the first sub-transaction is executed can be avoided.
In some embodiments, in order to avoid that the available duration of waiting for the lock is too long, and the execution efficiency of the write transaction is affected, the detection client running on the target partition management node may detect whether the lock waiting duration of the first sub-transaction reaches a preset lock waiting timeout threshold. The lock waiting timeout threshold may be specifically set by a user according to an actual requirement, or may be a default value of a default system, which is not limited in this application. If the lock wait time period reaches the lock wait timeout threshold, the first sub-transaction may also be deleted from the lock wait queue.
In practical application, whether the lock waiting time of the first sub-transaction reaches the lock waiting timeout threshold can be detected first, and if not, whether the lock waiting time of the first sub-transaction reaches the deadlock detection threshold is detected, so that the problem that the deadlock detection cannot be executed in time due to the fact that the detection server is busy or abnormal can be avoided.
The lock wait timeout threshold may be greater than the deadlock detection threshold. In this case, even if the first sub-transaction trigger deadlock is not detected, if the waiting time period for the lock to be available is too long, the first sub-transaction is deleted from the lock waiting queue.
In some embodiments, after the first sub-transaction is removed from the lock wait queue, it is indicated that the first sub-transaction failed to execute, and therefore a transaction rollback is required for the write transaction containing the first sub-transaction.
In some embodiments, each transaction may have its transaction execution duration within which the transaction needs to be executed to complete. After performing the transaction rollback for the write transaction including the first sub-transaction, it may be further determined whether a remaining execution duration of the write transaction exceeds a preset transaction retry threshold, and if so, a transaction retry may be performed for the write transaction, i.e., the execution of the write transaction may be restarted.
In the above technical solution, a detecting client running on each partition management node may detect a sub-transaction in a lock waiting queue, where a lock waiting duration reaches a deadlock detection threshold, send a generated lock waiting relationship corresponding to the sub-transaction to a detecting server corresponding to the detecting client, update a global lock waiting relationship by the detecting server based on the lock waiting relationship, and perform deadlock detection on the sub-transaction based on the updated global lock waiting relationship, and if the sub-transaction is detected to trigger deadlock, the detecting client may delete the sub-transaction from the lock waiting queue.
By adopting the mode, in the process of realizing deadlock detection and avoiding the flow of the write transaction from being blocked, the deadlock detection can be carried out only on the sub-transactions with the lock waiting time reaching the deadlock detection threshold, so that a part of unnecessary deadlock detection can be filtered. In addition, the deadlock detection threshold can be adjusted according to the actual deadlock detection requirement, so that the sensitivity of the deadlock detection is controlled to adapt to different deadlock detection environments. For example, if the frequency of deadlock occurrences is high, the deadlock detection threshold may be lowered to discover deadlock faster.
Referring to fig. 6 in conjunction with fig. 1 and 2, fig. 6 is a flowchart illustrating another deadlock detection method according to an exemplary embodiment of the present application.
In this embodiment, a physical host as shown in fig. 1 or a host cluster as shown in fig. 2 may be referred to as a target electronic device. That is, the target electronic device may be configured to carry a database, and the target electronic device may include a number of shard management nodes configured to manage data shards of data stored in the database.
For the plurality of fragment management nodes included in the target electronic device, the fragment management nodes may respectively run a detection client, and the target electronic device may also run a detection server corresponding to the detection client.
The deadlock detection method can be applied to the detection server. The dead lock detection method may include:
step 602: receiving a lock waiting relationship corresponding to the first sub-transaction, which is sent by a detection client running on any target fragment management node; the lock waiting relationship is generated by the detection client when detecting that the lock waiting time of a first sub-transaction in a lock waiting queue maintained locally reaches a preset deadlock detection threshold; the first sub-transaction is a sub-transaction split from a write transaction for data stored in the database for a data fragment corresponding to the target fragment management node.
Step 604: and in response to the received lock waiting relationship, updating a global lock waiting relationship maintained locally based on the lock waiting relationship, and performing deadlock detection on the first sub-transaction based on the global lock waiting relationship.
Step 606: and sending a deadlock detection result to a detection client running on the target fragment management node, so that the detection client responds to the received deadlock detection result, and when the first sub-transaction is determined to trigger deadlock based on the deadlock detection result, the first sub-transaction is deleted from the lock waiting queue.
The specific implementation of the embodiment shown in fig. 6 may refer to the embodiment shown in fig. 3, which will not be described in detail herein.
Referring to fig. 7 in conjunction with fig. 1 and 2, fig. 7 is a flowchart illustrating another deadlock detection method according to an exemplary embodiment of the present application.
In this embodiment, a physical host as shown in fig. 1 or a host cluster as shown in fig. 2 may be referred to as a target electronic device. That is, the target electronic device may be configured to carry a database, and the target electronic device may include a number of shard management nodes configured to manage data shards of data stored in the database.
A write transaction for data stored in the database may be split into sub-transactions for each data fragment of the data stored in the database. In this case, one shard management node may execute a sub-transaction for the data shard corresponding to this shard management node (i.e., the data shard managed by this shard management node) split from the write transaction for the data stored in the database.
Referring to fig. 8, fig. 8 is a schematic diagram of another target electronic device according to an exemplary embodiment of the present application.
As shown in fig. 8, for several fragment management nodes included in the target electronic device, detection programs (also referred to as detection modules) may be respectively run on the fragment management nodes.
The above-described deadlock detection method may be applied to a detection program running on any one of the above-described several fragment management nodes (which may be referred to as a target fragment management node). The dead lock detection method may include:
step 702: detecting whether the lock waiting time length of a first sub-transaction in a lock waiting queue maintained locally reaches a preset deadlock detection threshold; the first sub-transaction is a sub-transaction which is split from a write transaction for data stored in the database and is for data fragments corresponding to the target fragment management node.
In this embodiment, the target fragment management node may maintain a lock wait queue locally. The sub-transaction in the lock waiting queue is split from the write transaction for the data stored in the database, and is the sub-transaction for the data fragment corresponding to the target fragment management node. When a plurality of sub-transactions simultaneously request a lock of a certain data (e.g., a node, an edge, etc. of a graph) in a data partition corresponding to the target partition management node, if the lock is already occupied, the sub-transaction requesting the data is placed in a lock waiting queue corresponding to the lock until the lock is available.
The detection program running on the target fragment management node can detect whether the lock waiting time length of a certain sub-transaction (which can be called as a first sub-transaction) in the lock waiting queue reaches a preset deadlock detection threshold; the lock wait duration refers to the length of time that a lock is waiting in the lock wait queue. The dead lock detection threshold value can be specifically set by a user according to actual deadlock detection requirements, or can be a default value of a system default, which is not limited in the application.
Step 704: and if the lock waiting time length reaches the deadlock detection threshold, generating a lock waiting relation corresponding to the first sub-transaction, and synchronizing the lock waiting relation from other partition management nodes.
In this embodiment, when the detection program running on the target partition management node detects that the lock waiting duration of the first sub-transaction reaches the deadlock detection threshold, a lock waiting relationship corresponding to the first sub-transaction may be generated; the lock wait relationship describes a lock wait condition between the first sub-transaction and the other sub-transactions.
For example, assuming that sub-transaction A currently holds lock 1, then if sub-transaction B needs lock 1, then it is necessary to wait for sub-transaction A to release lock 1, so the lock wait relationship for sub-transaction B to wait for sub-transaction A may be expressed as sub-transaction B→sub-transaction A.
In addition, the detection program running on the target fragment management node can synchronize the lock waiting relationship from other fragment management nodes. For any other fragment management node, the lock latency relationship synchronized to from the fragment management node is the lock latency relationship currently maintained locally by the fragment management node as generated by the detection program running on the fragment management node.
Step 706: and generating a global lock waiting relationship based on the lock waiting relationship corresponding to the first sub-transaction and the synchronized lock waiting relationship, and performing deadlock detection on the first sub-transaction based on the global lock waiting relationship.
In this embodiment, when the detection program running on the target partition management node generates the lock waiting relationship corresponding to the first sub-transaction and synchronizes the lock waiting relationship from other partition management nodes, the detection program may generate a global lock waiting relationship based on the lock waiting relationship corresponding to the first sub-transaction and the synchronized lock waiting relationship, and perform deadlock detection on the first sub-transaction based on the global lock waiting relationship. The global lock waiting relationship may be formed by integrating all lock waiting relationships by the detection program.
In practical applications, deadlock detection is performed on the first sub-transaction based on the global lock waiting relationship, specifically, whether a cyclic lock waiting relationship using the first sub-transaction as a starting point and the first sub-transaction as an end point exists or not may be detected based on the global lock waiting relationship.
For example, sub-transaction B→sub-transaction A→sub-transaction B is a cyclic lock wait relationship starting at sub-transaction B and ending at sub-transaction B.
In some embodiments, the lock wait relationship may include an association between a transaction identification of the first sub-transaction and a transaction identification (e.g., a transaction number, etc.) of the sub-transaction holding the lock (which may be referred to as a second sub-transaction).
Accordingly, the global lock latency relationship may include a directed graph constructed based on lock latency relationships sent by detection clients running on respective fragment management nodes. Wherein, the vertex in the directed graph is the transaction identifier of the sub-transaction, the direction of the directed edge in the directed graph is the transaction identifier of the sub-transaction waiting for the lock, and the transaction identifier of the sub-transaction pointing to the holding lock.
In an actual application, the lock waiting relationship may further include a partition identifier of a data partition to which the first sub-transaction belongs, so as to divide the global lock waiting relationship according to the data partition.
Referring to fig. 5, fig. 5 is a schematic diagram illustrating a global lock wait relationship according to an exemplary embodiment of the present application.
As shown in fig. 5, in the directed graph as the global lock wait relationship, vertex a represents sub-transaction a, vertex B represents sub-transaction B, and vertex C represents sub-transaction C; the direction of the directed edge between the vertex C and the vertex B points to the vertex B from the vertex C, which means that the sub-transaction C waits for the sub-transaction B, and the direction of the directed edge between the vertex B and the vertex A points to the vertex A from the vertex B, which means that the sub-transaction B waits for the sub-transaction A.
When deadlock detection is performed on the first sub-transaction based on the global lock waiting relationship, loop detection may be performed on a directed graph serving as the global lock waiting relationship, so as to detect whether a loop with a transaction identifier of the first sub-transaction as a starting point (that is, with the first sub-transaction as an end point) exists in the directed graph. If such a loop exists in the directed graph, the first sub-transaction may be determined to trigger a deadlock.
By taking the directed graph constructed based on the lock waiting relationship from each partition management node as the global lock waiting relationship, deadlock detection can be carried out on one sub-transaction based on the global lock waiting relationship, and the detection can be converted into detection of whether a loop taking the sub-transaction as a starting point exists in the directed graph, so that the complexity of deadlock detection can be reduced, and the efficiency of deadlock detection can be improved.
In some embodiments, when loop detection is performed on the directed graph to determine whether a loop with the transaction identifier of the First sub-transaction as the starting point exists in the directed graph, the transaction identifier of the First sub-transaction may be specifically used as a traversing starting point based on a Depth-First traversing (Depth-First) algorithm, and the directed graph may be traversed to detect whether a loop with the transaction identifier of the First sub-transaction as the starting point exists in the directed graph.
Depth-first traversal is an algorithm for traversing or searching a tree or graph, the core idea of which is to start from a starting node, visit as deep as possible along a path, go back until reaching the deepest point, and then explore other non-visited branches. Depth-first traversal is characterized by a preference for depth, exploring as deep as possible along each path until no further depth can be continued. This makes the algorithm suitable for solving problems such as connectivity of the graph, path finding, etc.
Step 708: and deleting the first sub-transaction from the lock waiting queue when the first sub-transaction is determined to trigger deadlock.
In this embodiment, when the detection program running on the target partition management node determines that the first sub-transaction is triggered, the first sub-transaction may be deleted from the lock waiting queue, so as to solve the problem that the flow of the write transaction including the first sub-transaction is blocked, and ensure the subsequent normal operation of the system.
It should be noted that, the specific thread placed in the lock waiting queue may be a thread for executing a sub-transaction. If a sub-transaction triggers a deadlock, the thread for executing the sub-transaction may be interrupted, i.e., the sub-transaction may be considered to be deleted from the lock wait queue.
In some embodiments, when the detection program running on the target partition management node determines that the first sub-transaction triggers a deadlock, a lock waiting relationship corresponding to the first sub-transaction may be deleted, so as to avoid that the first sub-transaction triggers a deadlock from being repeatedly detected subsequently, which results in abnormal deadlock detection.
In practical applications, after the deadlock detection for the first sub-transaction is completed, the global lock waiting relationship may be deleted regardless of whether the first sub-transaction triggers a deadlock.
In some embodiments, when the detection program running on the target partition management node determines that the first sub-transaction triggers a deadlock, in addition to deleting the first sub-transaction from the lock waiting queue, the detection program may output a prompt message of the first sub-transaction triggering a deadlock, so as to prompt the first sub-transaction to trigger the deadlock, and prompt to retry a write transaction containing the first sub-transaction, or to check and modify the write transaction containing the first sub-transaction.
In some embodiments, if the first sub-transaction trigger deadlock described above is not detected, the first sub-transaction may be executed when a lock is available. The detection client running on the target fragment management node may respond to the completion of the execution of the first sub-transaction, and send a deletion request corresponding to the lock waiting relationship corresponding to the first sub-transaction to the detection server. The detection server may respond to the deletion request to delete the lock waiting relationship from the global lock waiting relationship. Therefore, the occurrence of abnormal deadlock detection caused by detecting that the first sub-transaction triggers the deadlock after the first sub-transaction is executed can be avoided.
In some embodiments, in order to avoid that the available duration of waiting for the lock is too long, and the execution efficiency of the write transaction is affected, the detection client running on the target partition management node may detect whether the lock waiting duration of the first sub-transaction reaches a preset lock waiting timeout threshold. The lock waiting timeout threshold may be specifically set by a user according to an actual requirement, or may be a default value of a default system, which is not limited in this application. If the lock wait time period reaches the lock wait timeout threshold, the first sub-transaction may also be deleted from the lock wait queue.
In practical application, it may be detected whether the lock waiting duration of the first sub-transaction reaches the lock waiting timeout threshold, and if not, it is detected whether the lock waiting duration of the first sub-transaction reaches the deadlock detection threshold, so that it may be avoided that the deadlock detection cannot be performed in time due to busy or abnormal detection procedures.
The lock wait timeout threshold may be greater than the deadlock detection threshold. In this case, even if the first sub-transaction trigger deadlock is not detected, if the waiting time period for the lock to be available is too long, the first sub-transaction is deleted from the lock waiting queue.
In some embodiments, after the first sub-transaction is removed from the lock wait queue, it is indicated that the first sub-transaction failed to execute, and therefore a transaction rollback is required for the write transaction containing the first sub-transaction.
In some embodiments, each transaction may have its transaction execution duration within which the transaction needs to be executed to complete. After performing the transaction rollback for the write transaction including the first sub-transaction, it may be further determined whether a remaining execution duration of the write transaction exceeds a preset transaction retry threshold, and if so, a transaction retry may be performed for the write transaction, i.e., the execution of the write transaction may be restarted.
In the above technical solution, a detection program running on each partition management node may detect a sub-transaction in a lock waiting queue, where the lock waiting time length reaches a deadlock detection threshold, generate a global lock waiting relationship based on a generated lock waiting relationship corresponding to the sub-transaction and a lock waiting relationship synchronized from other partition management nodes, and perform deadlock detection on the sub-transaction based on the global lock waiting relationship, and if a deadlock is detected triggered by the sub-transaction, the detection program may delete the sub-transaction from the lock waiting queue.
By adopting the mode, on one hand, in the process of realizing deadlock detection and avoiding the flow of the writing transaction from being blocked, the deadlock detection can be carried out only on the sub-transaction with the lock waiting time reaching the deadlock detection threshold, so that a part of unnecessary deadlock detection can be filtered. On the other hand, the deadlock detection threshold can be adjusted according to the actual deadlock detection requirement, so that the sensitivity of the deadlock detection is controlled to adapt to different deadlock detection environments. For example, if the frequency of deadlock occurrences is high, the deadlock detection threshold may be lowered to discover deadlock faster. On the other hand, by adopting a non-central deadlock detection mode, the detection programs running on the partition management nodes execute deadlock detection, so that the problem that the deadlock detection is not timely due to the fact that a central detection server is blocked can be avoided.
Corresponding to the embodiment of the deadlock detection method, the application also provides an embodiment of the deadlock detection device.
Referring to fig. 9, fig. 9 is a schematic structural view of an apparatus according to an exemplary embodiment of the present application. At the hardware level, the device includes a processor 902, an internal bus 904, a network interface 906, memory 908, and non-volatile storage 910, although other hardware may be included as desired. One or more embodiments of the present application may be implemented in a software-based manner, such as by the processor 902 reading a corresponding computer program from the non-volatile storage 910 into the memory 908 and then running. Of course, in addition to software implementation, one or more embodiments of the present application do not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic module, but may also be hardware or a logic device.
Referring to fig. 10, fig. 10 is a block diagram of a deadlock detection device according to an exemplary embodiment of the present application.
The deadlock detection device can be applied to the equipment shown in fig. 9 to realize the technical scheme of the application. Specifically, the target electronic device carrying the database includes a plurality of fragment management nodes; the fragmentation management node is used for managing data fragmentation of the data stored in the database; the plurality of fragment management nodes respectively run detection clients; a detection server corresponding to the detection client is also operated on the target electronic equipment; the device is applied to the detection client running on any target fragment management node.
The deadlock detection device may include:
a detection module 1002, configured to detect whether a lock waiting duration of a first sub-transaction in a lock waiting queue maintained locally reaches a preset deadlock detection threshold; the first sub-transaction is a sub-transaction which is split from a write transaction for data stored in the database and is for data slicing corresponding to the target slicing management node;
a generating module 1004, configured to generate a lock waiting relationship corresponding to the first sub-transaction and send the lock waiting relationship to the detection server if the lock waiting duration reaches the deadlock detection threshold, so that the detection server responds to the received lock waiting relationship, updates a global lock waiting relationship maintained locally based on the lock waiting relationship, performs deadlock detection on the first sub-transaction based on the global lock waiting relationship, and returns a deadlock detection result;
a deleting module 1006, responsive to the received deadlock detection result, deletes the first sub-transaction from the lock wait queue when the first sub-transaction is determined to trigger a deadlock based on the deadlock detection result.
Referring to fig. 11, fig. 11 is a block diagram of another deadlock detection device according to an exemplary embodiment of the present application.
The deadlock detection device can be applied to the equipment shown in fig. 9 to realize the technical scheme of the application. Specifically, the target electronic device carrying the database includes a plurality of fragment management nodes; the fragmentation management node is used for managing the data fragmentation stored in the database; the plurality of fragment management nodes respectively run detection clients; a detection server corresponding to the detection client is also operated on the target electronic equipment; the device is applied to the detection server.
The deadlock detection device may include:
a receiving module 1102, configured to receive a lock waiting relationship corresponding to the first sub-transaction sent by a detection client running on any target partition management node; the lock waiting relationship is generated by the detection client when detecting that the lock waiting time of a first sub-transaction in a lock waiting queue maintained locally reaches a preset deadlock detection threshold; the first sub-transaction is a sub-transaction split from a write transaction for data stored in the database for a data fragment corresponding to the target fragment management node;
a detection module 1104, responsive to the lock latency received, to update a locally maintained global lock latency based on the lock latency and to perform deadlock detection for the first sub-transaction based on the global lock latency;
And a sending module 1106, configured to send a deadlock detection result to a detection client running on the target fragment management node, so that the detection client, in response to the received deadlock detection result, deletes the first sub-transaction from the lock waiting queue when determining that the first sub-transaction triggers a deadlock based on the deadlock detection result.
Referring to fig. 12, fig. 12 is a block diagram of another deadlock detection apparatus according to an exemplary embodiment of the present application.
The deadlock detection device can be applied to the equipment shown in fig. 9 to realize the technical scheme of the application. Specifically, the target electronic device carrying the database includes a plurality of fragment management nodes; the fragmentation management node is used for managing data fragmentation of the data stored in the database; the plurality of fragment management nodes respectively run detection programs; the device is applied to a detection program running on any target fragment management node.
The deadlock detection device may include:
a duration detection module 1202, configured to detect whether a lock waiting duration of a first sub-transaction in a lock waiting queue maintained locally reaches a preset deadlock detection threshold; the first sub-transaction is a sub-transaction which is split from a write transaction for data stored in the database and is for data slicing corresponding to the target slicing management node;
A generating module 1204, configured to generate a lock waiting relationship corresponding to the first sub-transaction and synchronize the lock waiting relationship from other partition management nodes if the lock waiting duration reaches the deadlock detection threshold;
the deadlock detection module 1206 generates a global lock waiting relationship based on the lock waiting relationship corresponding to the first sub-transaction and the synchronized lock waiting relationship, and performs deadlock detection on the first sub-transaction based on the global lock waiting relationship;
a delete module 1208 deletes the first sub-transaction from the lock wait queue upon determining that the first sub-transaction triggers a deadlock.
For the device embodiments, they essentially correspond to the method embodiments, so that reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all modules can be selected according to actual needs to achieve the purpose of the technical scheme of the application.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the present application. In some cases, the acts or steps recited in the present application may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The terminology used in one or more embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of one or more embodiments of the application. The singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The term "and/or" refers to and encompasses any or all possible combinations of one or more of the associated listed items.
The descriptions of the terms "one embodiment," "some embodiments," "example," "specific example," or "one implementation" and the like used in connection with one or more embodiments of the present application mean that a particular feature or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. The schematic descriptions of these terms are not necessarily directed to the same embodiment. Furthermore, the particular features or characteristics described may be combined in any suitable manner in one or more embodiments of the application. Furthermore, different embodiments, as well as specific features or characteristics of different embodiments, may be combined without contradiction.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination", depending on the context.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present application and is not intended to limit the embodiment(s) of the present application, since any modification, equivalent replacement, improvement or the like which comes within the spirit and principles of the embodiment(s) of the present application is included within the scope of the present application.
User information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to herein are both user-authorized or fully authorized information and data by parties, and the collection, use and processing of relevant data requires compliance with relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation portals for user selection of authorization or denial.