The method and device of data synchronization is realized based on broadcast mechanismTechnical field
The present invention relates to computer network and computer software fields, and in particular to one kind realizes data based on broadcast mechanismSynchronous method and device.
Background technology
With the arrival of big data information age, the data of any industry are all to be increased in the form of rising suddenly and sharply, and enterpriseEach operation system between data synchronize and each subservice system and core business system between the synchronous demand of dataIncreasingly highlight.
Big data synchronization scheme is generally comprised at present:
(1) data file of data source is exported;
(2) data file of data source is copied on destination node;
(3) data file is imported into target data source on the target node.
However, there is also following shortcomings for the above-mentioned prior art:
(1) program only supports one-to-one operation, can not the data file of data source be synchronized to multiple target data sourcesIn;
(2) it easily malfunctions in operating process, once error will then lead to loss of data, mistake;
(3) if the data volume of data source is very big, the resources such as a large amount of CUP, memory, bandwidth will be consumed.
Invention content
In view of this, the purpose of the present invention is to provide a kind of methods and dress that data synchronization is realized based on broadcast mechanismIt puts, to solve drawbacks described above in the prior art.
The technical scheme is that providing a kind of method that data synchronization is realized based on broadcast mechanism, this method includes:
Obtain data file to be synchronized;
Compress the data file;
The compressed data file is averagely divided into multiple blocks of files, and the blocks of files that segmentation is obtained is mapped toIn corresponding data structure, each node of the data structure corresponds to a blocks of files;
The blocks of files is broadcasted in the form of data-message to target corresponding with the type of service of the data fileNode, wherein, the destination node, by the file merged block of reception, and is solved according to the sequence of the data structure interior jointThe blocks of files after compression merging, so as to obtain the data file.
Optionally, the data structure includes but not limited to:Set, chained list, storehouse.
Optionally, during the blocks of files is mapped to the corresponding chained list, according to default naming rule pairThe blocks of files name, the storage format of the chained list is key-value types.
Optionally, this method further includes:During the file merged block that will be received generates the data file,The blocks of files is merged and decompressed according to the default naming rule, generates the data file.
The present invention also provides a kind of device that data synchronization is realized based on broadcast mechanism, described device includes:
Data acquisition module, for obtaining data file to be synchronized;
Data compressing module, for compressing the data file;
Data segmentation module, for the compressed data file to be averagely divided into multiple blocks of files, and will segmentationObtained blocks of files is mapped in corresponding data structure, and each node of the data structure corresponds to a blocks of files;
Data simultaneous module, for being broadcasted the blocks of files in the form of data-message to the industry with the data fileThe corresponding destination node of service type, wherein, the destination node is according to the sequence of the data structure interior joint by the institute of receptionFile merged block is stated, and decompresses the blocks of files after merging, so as to obtain the data file.
Optionally, the data structure includes but not limited to:Set, chained list, storehouse.
Optionally, the data segmentation module is additionally operable to:In the mistake that the blocks of files is mapped to the corresponding chained listCheng Zhong names the blocks of files according to default naming rule, and the storage format of the chained list is key-value types.
Optionally, the data simultaneous module is additionally operable to:The data text is generated in the file merged block that will be receivedDuring part, the blocks of files is merged and decompressed according to the default naming rule, generates the data file.
By the method and device provided by the invention that data synchronization is realized based on broadcast mechanism, can effectively realize automaticallyData to be synchronized are subjected to Efficient Compression, and carry out piecemeal processing by Distributed Application, then use broadcast by block countAccording to multiple destination nodes are distributed to, finally merge and decompress in destination node, the final synchronization for realizing data file ensuresThe performance and the reliability of data safety that data synchronize.
Description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodimentAttached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present invention, forFor those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawingsAttached drawing.In the accompanying drawings:
Fig. 1 is the method flow schematic diagram that data synchronization is realized based on broadcast mechanism of one embodiment of the invention;
Fig. 2 is the schematic diagram of the device that data synchronization is realized based on broadcast mechanism of one embodiment of the invention.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are more clearly understood, below in conjunction with the accompanying drawings to this hairBright embodiment is described in further details.Here, the illustrative embodiments of the present invention and their descriptions are used to explain the present invention, but simultaneouslyIt is not as a limitation of the invention.
Art technology technical staff knows, embodiments of the present invention can be implemented as a kind of system, device, equipment,Method or computer program product.Therefore, the disclosure can be with specific implementation is as follows, i.e.,:It is complete hardware, complete softThe form that part (including firmware, resident software, microcode etc.) or hardware and software combine.
Herein, it is to be understood that in involved term:
NDE:It is a kind of data compression algorithm, and is lossless compression, the program of realization is thread-safe.
Hadoop:Hadoop is a distributed system architecture developed by Apache funds club.User can be withIn the case where not knowing about distributed low-level details, distributed program is developed.The power of cluster is made full use of to carry out high-speed computationAnd storage.
NoSQL databases:Refer to the database of non-relational.
SQL database:SQL is the operation commands set for aiming at database and establishing, and is a kind of multiple functional database languageSpeech.
Illustrative methods
The method that data synchronize, which carries out, to be realized based on broadcast mechanism to exemplary embodiment of the invention below with reference to Fig. 1It introduces.This method includes:
Step S101:Obtain data file to be synchronized;
Step S102:Compress the data file;
Step S103:The compressed data file is averagely divided into multiple blocks of files, and will divide obtained textPart block is mapped in corresponding data structure, and each node of the data structure corresponds to a blocks of files;
Step S104:The blocks of files is broadcasted in the form of data-message to the type of service pair with the data fileThe destination node answered, wherein, the destination node is according to the sequence of the data structure interior joint by the blocks of files of receptionMerge, and decompress the blocks of files after merging, so as to obtain the data file.
Optionally, the data structure includes but not limited to:Set, chained list, storehouse.
Optionally, during the blocks of files is mapped to the corresponding chained list, according to default naming rule pairThe blocks of files name, the storage format of the chained list is key-value types.
Optionally, this method further includes:During the file merged block that will be received generates the data file,The blocks of files is merged and decompressed according to the default naming rule, generates the data file.
Embodiment
The present invention is specifically described with reference to a specific embodiment, however, it should be noted that the specific implementationExample merely to preferably description the present invention, do not constitute improper limitations of the present invention.
First, obtaining needs synchronous data and is converted to file.
In an embodiment of the present invention, it obtains data file to be synchronized and includes but not limited to following form, SQLserverDatabase is obtained using BCP orders, and hadoop is obtained using hadoopfs orders, and MySQL database uses mysqldump ordersIt obtains, MongDB is obtained using mongoexport orders.Since the above method can know those skilled in the art easilyDawn, therefore be not described here in detail.
Secondly, the data file is compressed.
Specifically, data compression is carried out to the data file using data compression algorithm.In an embodiment of the present invention,NDE compression ratios are adjusted according to the size of data file, so that the process of data compression is simple, decompression speed and consumption memory reachTo optimal value.
Then, the compressed data file is averagely divided into multiple blocks of files, and obtained blocks of files will be dividedIt is mapped in chained list, each node of chained list corresponds to a blocks of files.
It is described in detail below in the method that data structure divides data as chained list.
First, average segmentation is carried out to the data file according to the preconfigured number of nodes of data file.
Then, each node of chained list is traversed, and the blocks of files after the data file segmentation is mapped on chained list.
Wherein, the storage format of each blocks of files is in chained list:(KEY:Node identification, VALUE:Divided fileBlock).
Meanwhile the naming rule of those blocks of files is changed by segmentation sequence:Old file name _ block index _ segmentation file is totalNumber.
Finally, the blocks of files is broadcasted in the form of data-message to corresponding with the type of service of the data fileDestination node, wherein, the destination node, by the file merged block of reception, and is solved according to the sequence of the chained list interior jointThe blocks of files after compression merging, so as to obtain the data file.
First, by traversing the blocks of files that each node stores on chained list, each blocks of files is passed through into the shape of MQ messageFormula is sent to broadcasting center.
Secondly, it after broadcasting center receives All Files block, searches and subscribes to node, and will be corresponding with the type of service of subscriptionBlocks of files be issued to a certain subscription node of target data source.
Then, after which receives All Files block, the naming rule of these blocks of files is parsed, judges whether to receiveTo all files block of some data source.When all files block is received, merge these blocks of files by the sequence of block indexGenerate the data file.
Finally, the data file of merging is decompressed, and is imported into target data source according to different source data types.
In one embodiment of the invention, the mode for importeding into target data source includes but not limited to following form:SQLserverData source is imported using BCP orders, and hadoop is imported using hadoopfs orders, and MySQL database is ordered using mysqlimportIt enables and importing, mongdb is imported using mongoimport orders etc..
The present invention also provides a kind of device 2 that data synchronization is realized based on broadcast mechanism, which includes:
Data acquisition module 21, for obtaining data file to be synchronized;
Data compressing module 22, for compressing the data file;
Data segmentation module 23 for the compressed data file to be averagely divided into multiple blocks of files, and will divideThe blocks of files cut is mapped in corresponding data structure, and each node of the data structure corresponds to a blocks of files;
Data simultaneous module 24, for by the blocks of files broadcasted in the form of data-message to the data fileThe corresponding destination node of type of service, wherein, the destination node is according to the sequence of the data structure interior joint by receptionThe file merged block, and the blocks of files after merging is decompressed, so as to obtain the data file.
Optionally, the data structure includes but not limited to:Set, chained list, storehouse.
Optionally, the data segmentation module 23 is additionally operable to:The blocks of files is being mapped to the corresponding chained listIn the process, the blocks of files is named according to default naming rule, the storage format of the chained list is key-value types.
Optionally, the data simultaneous module 24 is additionally operable to:The data are generated in the file merged block that will be receivedDuring file, the blocks of files is merged and decompressed according to the default naming rule, generates the data file.
Realize that the device that data synchronize is the corresponding device of the above method based on broadcast mechanism due to provided by the invention, thereforeDetails are not described herein.
By the method and device provided by the invention that data synchronization is realized based on broadcast mechanism, can effectively realize automaticallyData to be synchronized are subjected to Efficient Compression, and carry out piecemeal processing by Distributed Application, then use broadcast by block countAccording to multiple destination nodes are distributed to, finally merge and decompress in destination node, the final synchronization for realizing data file ensuresThe performance and the reliability of data safety that data synchronize.
In addition, although the operation of the method for the present invention is described with particular order in the accompanying drawings, this do not require that orImply that the operation having to carry out shown in whole could realize desired result.Additionally or alternatively, it is convenient to omit certain steps,Multiple steps are merged into a step execution and/or a step is decomposed into execution of multiple steps.
Particular embodiments described above has carried out the purpose of the present invention, technical solution and advantageous effect further in detailDescribe in detail it is bright, it should be understood that the above is only a specific embodiment of the present invention, the guarantor being not intended to limit the present inventionRange is protected, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in thisWithin the protection domain of invention.