Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a data distribution caching method and a data distribution caching system for solving the problem of low access efficiency in the prior art, so that the occupation of cached data on a memory of a central database is effectively reduced, and the data access speed is improved.
(II) technical scheme
In order to achieve the above object, in one aspect, the present invention provides a data distribution caching method, including:
s1, when the generated service data or the service data changes, adding node ID information to each data according to a preset configuration strategy, and storing the data in a central database;
s2, distributing data, taking out the service data in the central database, and distributing the data to the corresponding node server according to the node ID information carried by each data;
s3, fetching the data distributed to the node from the node server, and synchronizing to the cache of the node;
s4, each user group is accessed to a specific node, and the user accesses the service data through the node cache.
Preferably, all the service data are stored in the central database uniformly.
Preferably, in step S1, the configuration policy includes:
when the service related table is designed, a service-independent splitting field is added as a basis for data distribution on each node;
table designs of the same service group are filled with the same field value;
when generating service data, configuring and filling a data value of a corresponding splitting field according to the attribute of the service;
when the data of the distributed cache nodes are synchronized, the split field values with the same attribute are synchronized to the cache of the distributed cache nodes.
Preferably, the step S2 further includes the steps of:
when data in the central database changes, triggering a trigger in the central database, checking the change type of the data by the trigger, inquiring node ID (identity) needing to be distributed of the changed data from a configuration table, and if the data needs to be distributed to a certain node, generating a logic log of the corresponding node;
recording the logic log in a data synchronization table;
regularly checking data in the data synchronization table, taking out the logic logs from the synchronization table when the logic logs needing to be synchronized exist, combining the logic logs of the same node in batches, and placing the logic logs in a work queue of a distribution thread;
analyzing the content of the data synchronization table into a corresponding logic log;
and analyzing a single logic log from the batch log information, taking out the content of the logic log, and updating the modification into a cache.
Preferably, the step S3 further includes the steps of:
reading node information from the configuration file, and inquiring a synchronous condition table through the node information;
inquiring the condition of the synchronous data and assembling SQL sentences of the synchronous inquiry data;
and querying data synchronized to the node through the assembled SQL statement, and writing the queried data into a cache.
Preferably, the step S4 further includes the steps of:
appointing a corresponding user group for each distribution node;
when a user accesses a system, a server accessed by the user is configured as a distribution node where the user is located;
the distribution node synchronizes the data of the user and provides the service aiming at the user for the user.
On the other hand, the invention also provides a data distribution cache system, which comprises a central database and at least one distribution node; wherein,
the central database consists of a plurality of database units and is used for uniformly storing all service data and configuring node ID information for each service data;
each distribution node comprises a distribution cluster and at least one synchronization server, wherein the distribution cluster consists of the distribution servers and is used for taking out the service data in the central database and distributing the data to the corresponding synchronization server according to the node ID information carried by each data; the synchronization server is used for acquiring the data distributed to the local from the distribution server and synchronizing the data to the local cache for the user to access.
In another aspect, the present invention also provides a data distribution caching system, where the system includes:
the data storage module is used for adding node ID information to each data according to a preset configuration strategy when the generated service data or the service data changes, and storing the data in the central database;
the data distribution module is used for distributing data, taking out the service data in the central database and distributing the data to corresponding node servers according to the node ID information carried by each data;
the data synchronization module is used for taking out the data distributed to the node from the node server and synchronizing the data to the cache of the node;
and the access control module is used for accessing each user group to a specific node, and the user caches and accesses the service data through the node.
Preferably, the data storage module includes a policy configuration module, and the policy configuration module further includes:
the field design module is used for increasing the split fields irrelevant to the service when the service related table is designed, and the split fields are used as the basis for distributing data on each node;
the table splitting module is used for filling the table design of the same service group with the same field value;
the service splitting module is used for configuring and filling a data value of a corresponding splitting field according to the attribute of the service when the service data is generated;
and the synchronization strategy module is used for synchronizing the split field value with the same attribute to the cache of the distributed cache nodes when the distributed cache nodes synchronize the data of the distributed cache nodes.
Preferably, the data distribution module further comprises:
the log generation module is used for triggering a trigger in the central database when data in the central database changes, the trigger checks the change type of the data, inquiring node ID (identity) required to be distributed by the changed data from the configuration table, and if the data is required to be distributed to a certain node, generating a logic log of the corresponding node;
the recording module is used for recording the logic log in the data synchronization table;
the polling module is used for regularly checking data in the data synchronization table, taking out the logic logs from the synchronization table when the logic logs needing to be synchronized exist, combining the logic logs of the same node in batches and placing the logic logs in a work queue of a distribution thread;
the table analysis module is used for analyzing the content of the data synchronization table into a corresponding logic log;
and the log analyzing module is used for analyzing a single logic log from the batch log information, taking out the content of the logic log and updating the modification into the cache.
(III) advantageous effects
In the scheme of the invention, the data with the same category attribute is divided into the same group of data, and then the service data with the same attribute is distributed to the cache recently accessed by the user through the distribution process and is provided for the service data, thereby reducing the occupation of the cache data on the memory of the central database and improving the access speed of the data.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The embodiment of the invention provides a processing method of data distribution cache, which divides data with the same category attribute into the same group of data, distributes service data with the same attribute to a cache recently accessed by a user through a distribution process, and provides the service data for use, thereby reducing the occupation of cache data on a memory of a central database and improving the access speed of the data. The invention can solve the problem of large data access and provides a processing method for distributed cache of mass data. Referring to fig. 1, the data distribution caching method of the present invention includes the steps of:
s1, when the generated service data or the service data changes, adding node ID information to each data according to a preset configuration strategy, and storing the data in a central database;
s2, distributing data, taking out the service data in the central database, and distributing the data to the corresponding node server according to the node ID information carried by each data;
s3, fetching the data distributed to the node from the node server, and synchronizing to the cache of the node;
s4, each user group is accessed to a specific node, and the user accesses the service data through the node cache.
Specifically, the scheme adopted in the embodiment of the invention is as follows: in the networking aspect, a central networking mode is adopted, namely a central node deploys a uniform database to perform uniform data storage, and data of each distributed cache node is stored in the database of the central node in a persistent mode. The data of the distributed nodes is a subset of the data in the central database, the data of the subset can be configured in the central database according to the service requirement to determine the data in the subset, and the data of each distributed node is only the data required by the node. After the data changes, the data of each distribution node synchronizes the changed data to each distribution node through a distribution process. When the service needs to access the data, the service data in the distribution node is directly read to provide the service without concerning the position of the service data stored in the database. According to the above description, the key technology of the data distribution caching method of the present invention is mainly divided into the following parts: data service distribution, data synchronous distribution and service data access control.
Firstly, the main function of data service distribution implementation is to configure data required by distribution nodes, and the data is distributed to each cache node according to a configuration mode. The method further comprises the following steps:
1. when the service related table is designed, fields which are irrelevant to the service are added, namely split fields are used as the basis for data distribution on each node;
2. tables of the same service group can be designed to fill the same field value;
3. when generating service data, configuring and filling a data value of a corresponding splitting field according to the attribute of the service;
4. when the data of the distributed cache nodes are synchronized, the split field values with the same attribute are synchronized to the cache of the distributed cache nodes.
For example: in the deployment of the province network, each city node has a node ID, the node ID corresponds to a field which is not related to a service, such as split ID, when data of the node needs to be split and distributed, the value of the split ID is filled in the data, and the data with the same split ID value are synchronized to the same node.
The main function of data synchronous distribution is to be responsible for synchronizing and distributing data, and the synchronous data can synchronize the data into a cache when a node is initialized; the data distribution function is to distribute data belonging to a certain node to the corresponding node when the data changes. This step is then subdivided into data synchronization or data distribution, wherein data synchronization further comprises the steps of:
1. reading node information from the configuration file, and inquiring a synchronization condition table through the node information, wherein the synchronization condition table comprises the following information: node ID, table ID, condition ID, synchronization condition value;
2. inquiring the condition of synchronous data according to the condition of inquiring the synchronous data through the node ID and the table ID, wherein the condition information is in a character string form, the form is such as ' field name ═ d ' and field name [% d ' ], and the value corresponding to [% d ] is stored in the condition table;
3. assembling SQL sentences of synchronous query data, wherein the assembling mode is that the condition values are written into the condition character strings, and the SQL sentences are directly formed through a function springf function of the c language;
4. and querying data synchronized to the node through the assembled SQL statement, and writing the queried data into a cache, namely the data corresponding to the node.
And the data distribution further comprises the steps of:
1. when data in the central database changes, triggering a trigger in the central database, checking the change type of the data by the trigger, wherein the change type of the data comprises insertion, update or deletion, inquiring a node ID (identity) to be distributed of the changed data from a configuration table, and if the data needs to be distributed to a certain node, generating a logic log of the corresponding node;
2. the logic log is recorded in a data synchronization table, and the data synchronization table comprises the following information: sequence ID, node ID, table ID, operation type, table name, primary key value, primary key field name.
3. A distribution process of a distribution node starts a polling thread, periodically checks data in a data synchronization table, when a logic log needing to be synchronized exists, the logic log is taken out from the synchronization table, the logic logs of the same node are combined in batches and placed in a work queue of the distribution thread, the logic log is sent to a cache receiving process corresponding to the node through TCP connection by waiting for the work thread, after the logic log of the work thread is sent successfully, the data which is successfully synchronized in the synchronization table is deleted, the data in the queue is emptied, and then the polling thread can continuously write the synchronized logic log into the queue;
4. the content of the logic log and the content of the data synchronization table have a corresponding relationship, the content of the data synchronization table is analyzed into the corresponding logic log through the corresponding relationship, and the analyzed logic log content is as follows: sequence ID, node ID, table ID, operation type, primary key value and record value; the method for analyzing the data synchronization table into the logic log comprises the following steps: taking out the data synchronization table record, filling the corresponding content into the logic log, if the sequence ID of the table corresponds to the sequence ID of the logic log, the primary key ID corresponds to the primary key value, the record value of the logic log is taken out from the database through the table name, the primary key value and the field name of the primary key, and then the record value is assembled into a binary record value according to the field type;
5. receiving logic log information which is waited to be received by a process, analyzing a single logic log from batch log information, taking out the content of the logic log, and updating the modification into a cache; by setting the main key, repeated execution of the log can be ensured without generating data inconsistency.
Finally, the service data access control provides a data routing mode to locate the service on the designated service server, so that the corresponding data can be accessed. In a central deployment mode, users are accessed to a node nearest to the users in each distribution node according to the attributes of the users, for example, user data, and each user is accessed to the node belonging to the user, so that the data belonging to the node can be accessed. The access control specifically comprises the following steps:
1. each distributed node serves a designated user group;
2. when a user accesses a system, configuring an accessed server as a distributed node server where the user is located;
3. the distribution node synchronizes the user's data so that services to the user can be provided.
As can be seen from the above description, the data distribution caching system of the present invention is shown in fig. 2, and mainly includes: the system comprises a central database and at least one distribution node (in fig. 2, two distribution nodes, namely a distribution node 1 and a distribution node 2 are taken as examples), wherein the central database consists of a plurality of database units (in fig. 2, the central database consists of n database units from a database 1 to a database n) and is used for uniformly storing all service data and configuring node ID information for each service data; each distribution node comprises a distribution cluster and at least one synchronization server, wherein the distribution cluster consists of the distribution servers and is used for taking out the service data in the central database and distributing the data to the corresponding synchronization server according to the node ID information carried by each data; the synchronization server is used for acquiring the data distributed to the local from the distribution server and synchronizing the data to the local cache for the user to access. In fig. 2, two distribution servers (a host and a standby) form an HA (high availability) cluster system. The local cache of each synchronization server manages each service data in the form of a database (shineDB is taken as an example in FIG. 2 to manage a plurality of service data services 1, 2, etc.).
Furthermore, it is understood by those skilled in the art that all or part of the steps in the method of the above embodiments may be implemented by hardware related to instructions of a program, the program may be stored in a computer readable storage medium, and when executed, the program includes the steps of the method of the above embodiments, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like. Therefore, corresponding to the method of the present invention, the present invention also includes a data distribution cache system, which is generally expressed in the form of functional modules corresponding to the steps of the method; the system comprises a data storage module, a data distribution module, a data synchronization module and an access control module, wherein,
the data storage module is used for adding node ID information to each data according to a preset configuration strategy when the generated service data or the service data changes, and storing the data in the central database;
the data distribution module is used for distributing data, taking out the service data in the central database and distributing the data to corresponding node servers according to the node ID information carried by each data;
the data synchronization module is used for taking out the data distributed to the node from the node server and synchronizing the data to the cache of the node;
and the access control module is used for accessing each user group to a specific node, and the user caches and accesses the service data through the node.
By adopting the scheme of the invention, the following obvious advantages are achieved:
1. the distributed cache method provided by the invention can be conveniently configured, distribute data on different distributed nodes and provide service for users distributed on the nodes, thereby reducing the data volume on each distributed node and improving the data access performance;
2. the data division method and the service provided by the invention are mutually independent, and a data distribution configuration method is provided, so that the implementation of program codes on configuration is simplified, and a simple method for synchronizing data from a physical database to a distribution cache in real time is provided;
3. by distributing data to different nodes, the system can be independent into each irrelevant node, thereby reducing the mutual dependence of each node, simplifying the realization of the system, reducing the configuration of each node server and reducing the networking cost of the system.
The above embodiments are only for illustrating the invention and are not to be construed as limiting the invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention, therefore, all equivalent technical solutions also fall into the scope of the invention, and the scope of the invention is defined by the claims.