Fusion management system and method for heterogeneous service systemTechnical Field
The invention relates to the technical field of data processing, in particular to a fusion management system and a fusion management method for a heterogeneous service system.
Background
For most enterprises, in internal management and business management of the enterprises, the adopted systems generally include a self-research system and a purchasing system, and the self-research system and the purchasing system generally belong to heterogeneous systems due to different technologies, so how to uniformly manage the various heterogeneous systems becomes a problem which needs to be solved urgently, which relates to the problem of fusion of the heterogeneous systems, and in the fusion of the heterogeneous systems, the fusion of multi-source data is a primary problem.
In the prior art, the fusion of multi-source data comprises the acquisition of the multi-source data, the preprocessing of the data, and the processing and storage of the data, wherein in order to ensure the security of the data, the index data is divided, a distributed storage mode is adopted, namely the divided index data is distributed to different storage nodes, and the data of the same storage node is stored to different nodes in the form of two copies, and the problem exists that if the node for storing the copy data and other divided data are stored on the same node, the security of the data storage is reduced; in addition, after data is randomly stored, load judgment is performed on each storage node, and then load data migration is performed on the empty load node and the overload node, so that a lot of unnecessary system overhead is generated.
Disclosure of Invention
One objective of the present invention is to provide a fusion management method for heterogeneous service systems, so as to improve the security of data storage, reduce the load data migration volume, and reduce the system overhead.
The method comprises the steps of collecting multi-source data, preprocessing the multi-source data, judging single-node storage loads, constructing an idle node ordered queue according to the judging result and the storage load degree, segmenting index data, distributing the segmented data to different idle nodes for storage in a distributed index mode according to the idle node ordered queue, generating two copy data according to the data stored in the same idle node, judging whether a certain idle node stores the segmented data or not, and storing the generated copy data into the idle node if the idle node stores the segmented data.
The method has the beneficial effects that: before data storage, judging single-node storage load, constructing an idle load node ordered queue according to the storage load degree according to the judgment result, then distributing the segmented data to different idle load nodes for storage according to the idle load node ordered queue in a distributed index mode, effectively avoiding the problem that the data is stored to a full load node or an overload node and then a large amount of data migration needs to be carried out between the nodes subsequently, reducing the system overhead, simultaneously, because the data does not need to be migrated for the second time in the subsequent process or the number of the migrated for the second time is small, improving the safety of the data in the layer, in addition, before storing the duplicate data, judging whether a certain idle load node stores the segmented data or not, if the data is not stored, storing the generated duplicate data into the idle load node, and avoiding the duplicate data and other segmented data from being stored on the same node, therefore, potential safety hazards exist in the data (distributed storage is adopted, namely, the data safety is improved), and the safety of the data storage is further improved.
Further, the segmented data are randomly distributed to different no-load nodes for storage according to the storage load degree. And a random distribution mode is adopted instead of sequential storage according to the storage load degree, so that the safety of data storage is effectively improved.
And further, after distributing the segmented data to different idle load nodes for storage, judging the storage load of the idle load nodes in which the segmented data is stored again, and if calculating that a certain idle load node is fully loaded, marking the idle load node as a fully loaded node.
After the data segmented at this time is stored in a no-load node, whether the node is the no-load node or the no-load node cannot be guaranteed, if the no-load node is fully loaded, the no-load node needs to be marked and marked as a fully loaded node, so that when the data is stored next time, the fully loaded node is not subjected to data storage, the generation of an overload node is avoided as much as possible, and the data migration amount is reduced.
And further, calculating the storable data volume of each idle load node, matching the data volume of each segmented data with the storable data volume of each idle load node, and storing the segmented data into the idle load node with high matching degree.
The segmented data is stored in the idle nodes with high matching degree, and the method has the significance that the data storage amount of each storage node is balanced as much as possible (most storage nodes are in a full-load state), the generation of overload storage nodes is reduced as much as possible, and the subsequent load migration number is reduced.
Further, in order to ensure load balance of each storage node as much as possible without affecting system performance, the method also comprises the step of carrying out overload calculation on the storage nodes, and if one storage node is calculated to be an overload node, carrying out data migration on the overload node and migrating data to an idle node.
The second objective of the present invention is to provide a convergence management system for heterogeneous service systems.
A fusion management system of a heterogeneous service system comprises a multi-source data acquisition module, a fusion management module and a fusion management module, wherein the multi-source data acquisition module is used for acquiring multi-source data; the multi-source data preprocessing module is used for preprocessing multi-source data, and the preprocessing comprises data cleaning conversion; the system also comprises a storage load judgment module, a data processing module and a data processing module, wherein the storage load judgment module is used for judging the single-node storage load so as to calculate no-load nodes and construct an ordered queue of no-load nodes according to the degree of the storage load; the index data storage module is used for segmenting the index data and distributing the segmented data to different idle load nodes in a distributed index mode according to the idle load node ordered queue; the duplicate generation module is used for generating two duplicate data according to the data stored in the same no-load node; the no-load node storage data judgment module is used for judging whether a certain no-load node stores segmented data or not; and the duplicate data storage module is used for storing the generated duplicate data into the idle node when the segmented data is not stored in the idle node.
The system of the invention has the advantages that: through the storage load judging module, before storing the segmented data, judging the single-node storage load to calculate no-load nodes, constructing an no-load node ordered queue according to the storage load degree, and then distributing the segmented data to different no-load nodes in a distributed index mode according to the no-load node ordered queue, so that the segmented data cannot be randomly stored on the full-load or overload storage nodes, the system operation performance is improved, the subsequent data volume migrating in the load is reduced, and the system overhead is reduced; furthermore, the system judges whether a certain no-load node stores the segmented data or not through the no-load node storage data judging module before the duplicate data is stored, so that the duplicate data and other segmented data are prevented from being stored on the same node, and the data safety is prevented from being influenced.
Furthermore, in order to calculate the full-load storage node in time and avoid the overload storage in the next data storage, the system also comprises a node marking module, which is used for judging the storage load of the no-load node which stores the segmented data, and marking the no-load node as the full-load node if calculating that a certain no-load node is full.
And the data volume matching module is used for calculating the data volume which can be stored by each idle load node, matching the data volume of each piece of data after segmentation with the data volume which can be stored by each idle load node, and storing the data after segmentation into the idle load node with high matching degree.
The segmented data is stored in the no-load nodes with high matching degree through the data volume matching module, so that the data volume stored by each storage node is balanced as much as possible, the generation of overloaded storage nodes is reduced as much as possible, and the subsequent load migration number is reduced.
Drawings
Fig. 1 is a schematic block diagram of an embodiment of a convergence management system of a heterogeneous service system according to the present invention.
Detailed Description
The following is further detailed by way of specific embodiments:
in the embodiment, the mode of collecting the multi-source data is to obtain structured and unstructured multi-source data through system logs and network data of various heterogeneous systems, and the network data is collected through a network crawler.
In this embodiment, the cleaning includes extracting the data in a manner including, but not limited to, full extraction, incremental extraction, static and dynamic data capture, etc., so as to eliminate errors and inconsistencies and check the validity of the data, and the data cleaning is converted into the prior art, which is not described herein again.
The method of the embodiment further comprises the steps of judging the storage load of the single node, constructing the no-load node ordered queue according to the judgment result and the storage load degree, specifically calculating the storage load rate of each storage node, calculating the average load of the system, if the storage load rate of the storage node is smaller than the average load of the system, the storage node is the no-load node, and then inserting the storage node into the no-load node queue in a stacking mode.
And segmenting the index data, and then distributing the segmented data to different idle load nodes for storage in a distributed index mode according to the idle load node ordered queue by adopting a hash function mapping mode. The method comprises the steps of distributing segmented data to different idle load nodes at random according to the degree of storage load to store, calculating the data quantity storable by each idle load node, matching the data quantity of each segmented data with the data quantity storable by each idle load node, and storing the segmented data into the idle load nodes with high matching degree.
And two copies of data are generated according to the data stored in the same idle node, wherein one copy plays a backup role, and the other copy has a query function. And judging whether a certain no-load node stores the segmented data or not, and if not, storing the generated copy data into the no-load node.
The embodiment further includes that after the data after the segmentation is distributed to different no-load nodes for storage, the no-load nodes which have stored the data after the segmentation are subjected to storage load judgment again, and if it is calculated that a certain no-load node is fully loaded, the no-load node is marked as a fully loaded node.
In order to balance the load of each storage node, the method further includes performing overload calculation on the storage nodes, and if it is calculated that a certain storage node is an overload node, performing data migration on the overload node, and migrating data to an idle node.
As shown in fig. 1, the embodiment and the method correspond to each other and further disclose a convergence management system of a heterogeneous service system, and besides the corresponding module functions disclosed by the system, the system also has other module functions corresponding to the method, which are not described in detail herein.
Specifically, the fusion management system of the heterogeneous service system in this embodiment includes a multi-source data acquisition module, configured to acquire multi-source data; the multi-source data preprocessing module is used for preprocessing multi-source data, and the preprocessing comprises data cleaning conversion; the system also comprises a storage load judgment module, a data processing module and a data processing module, wherein the storage load judgment module is used for judging the single-node storage load so as to calculate no-load nodes and construct an ordered queue of no-load nodes according to the degree of the storage load; the index data storage module is used for segmenting index data and distributing the segmented data to different idle load nodes in a distributed index mode according to the idle load node ordered queue.
The system also comprises a node marking module which is used for judging the storage load of the no-load node which has stored the segmented data, and marking the no-load node as a full-load node if a certain no-load node is calculated to be full-load.
The system also comprises a copy generation module, a data storage module and a data transmission module, wherein the copy generation module is used for generating two pieces of copy data according to the data stored in the same no-load node; the no-load node storage data judgment module is used for judging whether a certain no-load node stores segmented data or not; and the duplicate data storage module is used for storing the generated duplicate data into the idle node when the segmented data is not stored in the idle node.
The foregoing is merely an example of the present invention, and common general knowledge in the field of known specific structures and characteristics is not described herein in any greater extent than that known in the art at the filing date or prior to the priority date of the application, so that those skilled in the art can now appreciate that all of the above-described techniques in this field and have the ability to apply routine experimentation before this date can be combined with one or more of the present teachings to complete and implement the present invention, and that certain typical known structures or known methods do not pose any impediments to the implementation of the present invention by those skilled in the art. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.