Disclosure of Invention
The technical problem to be solved by the invention is to provide a dynamically constructed distributed data cluster control system and a method thereof, wherein the system can provide dynamically configurable data node addition and removal, ensure the data consistency among data nodes and provide low-delay service for data reading and writing.
The technical solution for realizing the purpose of the invention is as follows: a dynamically constructed distributed data cluster control system comprises a server device, a management device and N client devices, wherein the server device is simultaneously connected with the client devices and the management device; the server device comprises a plurality of service data node instances, all the service data node instances form a service node cluster, each service data node instance is called a service node, one service node serves as a central main node of the service node cluster, and the rest service nodes are data slave nodes;
the server device is used for analyzing and loading the network and basic configuration information of the distributed data cluster nodes;
the management end device provides an interface for managing the distributed data cluster for a user, and the user can inquire the state and data information of the distributed data cluster through a query language and can initialize data and update the data; the cluster state is that the service node works normally or fails;
and the client device is used for inquiring the central main node, establishing connection and executing data synchronization and data updating operation.
A dynamically constructed distributed data cluster control method based on the system comprises the following steps:
step 1, a user analyzes configuration information of a distributed data cluster through a server device; the method specifically comprises cluster node configuration information, cluster node network configuration information, data storage configuration information and log record configuration information;
step 2, running distributed data cluster nodes with parameters, and orderly adding newly added nodes into a cluster data queue one by one;
step 3, sequencing the service nodes into an ordered ring according to the size of the Hash value of each service node in the server device; then, each service node is connected with the right side node which is closest to the service node and normally works to form a stable ordered closed loop, and a dynamic distributed data cluster is constructed;
step 4, the management end device establishes connection with the server end device;
step 5, comparing the data versions of the management end device and the server end device, and if the data version of the server end device is newer than the data version of the management end device, executing data synchronization operation to update the data version of the management end device;
step 6, the management end device cannot acquire the data version of the server end device, namely the distributed data cluster of the server end device is shown as a newly constructed cluster, and the database table structure of the management end device initializes the data of the server end device;
step 7, the client device establishes connection with the server device for communication, and acquires a central main node for connection; the client device continuously detects the connection state of the client device and the central main node, and if the central main node fails, a new central main node is obtained for connection;
step 8, the server device has new data information and informs the client device to carry out data synchronization operation;
step 9, the client device inquires the data inside the server device to provide data service under the condition that the server device has no data updating notice;
step 10, the management terminal device inquires the state and the data state of each node of the distributed data cluster in the server terminal device;
step 11, the management terminal device queries the data version of the server terminal device through a data query language;
step 12, after respective data versions of the management side device and the client side device are updated, synchronously updating the data versions into a database of the server side device;
and step 13, the server side device distributes the new version data to the management side device and the client side device for data synchronization.
Compared with the prior art, the invention has the following remarkable advantages: 1) when the data node receives a plurality of data updates, the data nodes are put into one transaction and submitted to the database together, so that the low-delay index of data synchronization is improved; 2) according to the invention, the data updating is registered through the central main node, and other nodes are immediately informed of the data updating, so that the timeliness index of the data updating is improved; 3) the invention saves the continuous data updating in the memory of the central main node in an incremental mode for a certain time, thereby reducing the data processing amount; 4) the data nodes in the invention update the data through the central node, thereby ensuring the consistency of the data.
The present invention is described in further detail below with reference to the attached drawing figures.
Detailed Description
With reference to fig. 1, the dynamically constructed distributed data cluster control system of the present invention is characterized in that the system includes aserver device 1, amanagement device 2, andN client devices 3, where theserver device 1 is connected to theclient devices 3 and themanagement device 2 at the same time; theserver device 1 comprises a plurality of service data node instances, all the service data node instances form a service node cluster, each service data node instance is called a service node, one service node serves as a central main node of the service node cluster, and the rest service nodes are data slave nodes;
theserver device 1 is configured to parse and load network and basic configuration information of the distributed data cluster nodes, where the basic configuration information includes distributed cluster data node configuration information, distributed cluster data node network configuration information, data storage configuration information, and log record configuration information.
Themanagement end device 2 provides an interface for managing the distributed data cluster for a user, and the user can inquire the state and the data information of the distributed data cluster through a query language and can initialize data and update the data; the cluster state is that the service node works normally or fails. Before the distributed data cluster runs, the management-side device 2 initializes the data structure, that is, establishes a database table structure.
And theclient device 3 is used for inquiring the central main node and establishing connection, and executing data synchronization and data updating operation.
Further, one service node is used as a central host node of the service node cluster, and the specific process of selecting the central host node is as follows:
1) presetting a priority for each service node in the service node cluster, wherein the priorities corresponding to the service nodes are the same or different;
2) sorting the priority in descending order;
3) if only one service node corresponding to the maximum priority is available, taking the service node as a central main node; if a plurality of service nodes corresponding to the maximum priority are available, one service node is randomly selected as a central main node through a Hash method;
4) theclient device 3 sends heartbeat packets to the server device at regular time, if a certain service node fails to receive the heartbeat packets within the set time, namely the service node fails, the service node is removed from the service node cluster, the data of the service node is copied to other service nodes, and if the central main node is removed, the steps 2) and 3) are repeated to reselect the central main node from the rest service nodes.
With reference to fig. 2, the dynamically constructed distributed data cluster control method of the present invention includes the following steps:
step 1, a user analyzes configuration information of a distributed data cluster through aserver device 1; the method specifically comprises cluster node configuration information, cluster node network configuration information, data storage configuration information and log record configuration information;
step 2, running distributed data cluster nodes with parameters, and orderly adding newly added nodes into a cluster data queue one by one;
step 3, sequencing the service nodes into an ordered ring according to the size of the Hash value of each service node in theserver device 1; then, each service node is connected with the right side node which is closest to the service node and normally works to form a stable ordered closed loop, and a dynamic distributed data cluster is constructed;
step 4, themanagement end device 2 establishes connection with theserver end device 1;
step 5, comparing the data versions of themanagement terminal device 2 and theserver terminal device 1, and if the data version of theserver terminal device 1 is newer than the data version of themanagement terminal device 2, executing data synchronization operation to update the data version of themanagement terminal device 2;
step 6, themanagement end device 2 does not obtain the data version of theserver end device 1, namely, the distributed data cluster of theserver end device 1 is shown as a newly constructed cluster, and the database table structure of themanagement end device 2 initializes the data of theserver end device 1;
step 7, theclient device 3 establishes connection with theserver device 1 for communication, and acquires a central main node for connection; theclient device 3 continuously detects the connection state between the client device and the central main node, and if the central main node fails, a new central main node is obtained for connection;
step 8, theserver device 1 has new data information and notifies theclient device 3 to perform data synchronization operation as shown in fig. 3, which specifically includes:
step 8-1, the data of oneclient device 3 is updated and registered by the central main node of theserver device 1;
step 8-2, the central main node informs other data slave nodes of theserver side device 1 to update data;
if the data of the plurality ofclient devices 3 are updated in sequence, the central master node registers in sequence according to the updating sequence and stores the data for a certain time in the memory of the central master node to obtain a data updating packet, submits the data updating packet to the database of theserver device 1, and then notifies other data slave nodes to update the data;
step 9, theclient device 3 queries the internal data of theserver device 1 under the condition that the server device does not have a data updating notice so as to provide data service, so that the service responsiveness is improved;
step 10, themanagement terminal device 2 queries the state and data state of each node of the distributed data cluster in theserver device 1;
step 11, themanagement terminal device 2 queries the data version of theserver terminal device 1 through the data query language;
step 12, after the respective data versions of themanagement device 2 and theclient device 3 are updated, they are all updated into the database of theserver device 1 in synchronization as shown in fig. 4, specifically:
step 12-1, a certain data node, namely themanagement end device 2 or a certainclient end device 3, writes the update of the data into a memory to form a data packet;
step 12-2, when the transaction is submitted, the data packet is sent to a central main node;
and step 12-3, analyzing data conflicts:
1) at any time, the data states of the central nodes are the final states of the data;
2) when the data of the data node lags behind the central node, the data can still be updated, but:
1) the data node cannot add an entry that already exists on the central node. Wherein the items are determined by primary keys, identical primary keys being considered to be identical items;
2) the data node cannot delete entries that do not exist on the central node;
3) the data node can not modify the modified entry on the central node, namely when the entry of the central node is inconsistent with the entry of the data node, the data node can not modify the entry;
3) when the data of the data node lags behind the data of the central main node and causes update conflict, the data is rolled back and cannot be submitted to a database of the data node;
4) when the data of the data node lags behind the data of the central main node and leads to update conflict, the data node is actively synchronized with the central main node;
step 12-4, the central master node writes the received data packet into a database of the central master node, immediately sends the data packet to other data nodes, namely themanagement end device 2 or theclient end device 3, and simultaneously puts the data packet into a memory queue to queue and write the data packet into persistent storage, so that the data packet does not need to be returned after the completion of disk IO operation, and the disk IO delay is reduced; and if the data node receives a plurality of data packets at the same time, submitting the data packets to the database of the data node for updating.
Step 13, theserver device 1 distributes the new version data to themanagement device 2 and theclient device 3 to perform data synchronization.
Further, the connection between themanagement device 2 and theserver device 1 instep 4 and the connection between theclient device 3 and theserver device 1 in step 7 are both established as TCP communication connections.
Further, the data packet is sent to the central main node in the step 12-2 in a TCP _ NODELAY mode, so that the data packet is sent immediately when being forced to be transmitted, and the network delay is reduced.
Further, after the data packet is immediately sent to other data nodes in step 12-4, the data nodes will communicate with the central host node regularly to query the data state, and when the data nodes update data conflicts, the data nodes are immediately updated.
In summary, by using the technical scheme of the invention, a distributed data cluster can be dynamically constructed in a stable data service scene, and functional supports such as data distribution, local cache, copy redundancy backup, fault tolerance, transaction, concurrency control and the like are provided; the consistency of data is ensured in the processes of data migration, fault tolerance and concurrent reading and writing, and the storage access support and the data synchronization of a remote central database are provided for the application; aiming at the problems of calculation delay, network delay, disk IO delay and the like of a processor, a low-delay guarantee technology of data management is adopted, and the response timeliness of data updating and synchronization is improved.