Movatterモバイル変換


[0]ホーム

URL:


CN109299056A - A kind of method of data synchronization and device based on distributed file system - Google Patents

A kind of method of data synchronization and device based on distributed file system
Download PDF

Info

Publication number
CN109299056A
CN109299056ACN201811096362.XACN201811096362ACN109299056ACN 109299056 ACN109299056 ACN 109299056ACN 201811096362 ACN201811096362 ACN 201811096362ACN 109299056 ACN109299056 ACN 109299056A
Authority
CN
China
Prior art keywords
data
server
file
virtual server
metadata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811096362.XA
Other languages
Chinese (zh)
Other versions
CN109299056B (en
Inventor
张慧如
周建明
冯娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weifang Engineering Vocational College
Original Assignee
Weifang Engineering Vocational College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weifang Engineering Vocational CollegefiledCriticalWeifang Engineering Vocational College
Priority to CN201811096362.XApriorityCriticalpatent/CN109299056B/en
Publication of CN109299056ApublicationCriticalpatent/CN109299056A/en
Application grantedgrantedCritical
Publication of CN109299056BpublicationCriticalpatent/CN109299056B/en
Expired - Fee Relatedlegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The present invention relates to a kind of method of data synchronization and device based on distributed file system, communication interaction is carried out using two server operating mode and physical server and virtual server simultaneously by data server, establish main memory cluster and metadata cluster based on database, signal difference writes data to the temporary data block based on the received, the file content inside former back end is replaced with the content of temporary data block, realizes that data are synchronous.The operating mode that the present invention is interactively communicated using physical server and virtual server and two server, Each performs its own functions is each responsible for for two servers, is switched to single server operating mode when necessary, the operational efficiency of system has been effectively ensured;Meanwhile the cluster content synchronous to data with metadata cluster carries out clustering processing to data store internal based on memory, treats respectively, the reasonable distribution of data isochronous resources and the accuracy that data are synchronous has been effectively ensured.

Description

Data synchronization method and device based on distributed file system
Technical Field
The present application belongs to the field of distributed processing technologies, and in particular, to a data synchronization method and apparatus based on a distributed file system.
Background
With the continuous improvement of the life quality of people, the application of the internet is also continuously popularized. In order to provide services for people more conveniently, the application of the internet is continuously developed and evolved, and meanwhile, the network security problem is more and more, so that the demand of the internet security products on the market is continuously increased. Among many network security problems, the situation that important information is changed or lost due to accidental deletion of files is the most serious, and therefore, research on the technical field of webpage accidental deletion prevention by websites at home and abroad is continuously carried out.
At the initial stage of the occurrence of the webpage tamper-resistant system, the internal structure is very simple, and functional modules are divided less. Some basic security problems of the website can be solved, but the website has many defects. If a hacker uses a large-scale and continuous-action tampering activity to attack an important website, the anti-tampering system cannot complete the protection function of the website. With the increasing dependence on the internet and the increasing access to web pages, a simple tamper-proof system for protecting the security of a website cannot cope with the situation. Therefore, in order to effectively prevent the web page from being tampered and protect the security of the website, web page tamper-resistant systems are gradually developed and perfected. With the gradual improvement of the anti-tampering technology, the internal structure of the anti-tampering system becomes more and more complex, and the functional modules are more and more divided. The anti-tampering system at this moment is just matched with each other through the interaction among all the modules, and the function of the whole anti-tampering system is completed. The modules are closely related, and the rings are buckled with each other.
Therefore, in the webpage tamper-resistant system, it is necessary to perform more optimization on the distributed file synchronization system implemented in the text, so that the function of multi-machine publishing of files can be completed through simpler system information configuration operation, and the simplicity of system use is improved; the synchronization system needs to carry out further intensive research work on the design aspect of improving the file transmission efficiency, fully optimizes the function of the system, can be better fused with a tamper-resistant system, and exerts the due function of the system.
Disclosure of Invention
The invention requests to protect a data synchronization method based on a distributed file system, which integrally adopts a working mode of interactive communication between a physical server and a virtual server and double servers for synchronous verification of data interaction, adopts a mode of separating memory data and a metadata cluster in the internal part, and executes corresponding work by each server, thereby achieving the technical effects of timely synchronizing data and accurately updating.
A data synchronization method based on a distributed file system is characterized in that:
monitoring the working states of a physical server and a virtual server in the process that a data server adopts a double-server working mode to simultaneously carry out communication interaction with the physical server and the virtual server;
wherein the communication interaction comprises: performing signal interaction with the physical server, and performing data interaction with the physical server and the virtual server simultaneously;
establishing a database-based memory cluster and a metadata cluster, storing the memory data of the distributed file system to a distributed database in the cluster, and simultaneously storing the metadata of the distributed database in the cluster for processing operation;
if the data server determines that the physical server fails and the virtual server works normally, the data server sends a single-server working mode switching instruction to the virtual server;
the method comprises the steps that an automatic physical server is connected to a virtual server, the resource hierarchical management is carried out according to virtual machine resources defined by an application logic architecture, the intelligent allocation of the resources is calculated, and the resources are dynamically expanded on line;
the data server receives a first confirmation response returned by the virtual server, and continues to perform the signal interaction and the data interaction with the virtual server in a single-server working mode;
and creating a temporary data block by the data node of the virtual server, writing data into the temporary data block according to the received signal difference, and replacing the file content in the original data node with the content of the temporary data block to realize data synchronization.
According to the invention, the working mode of interactive communication between the physical server and the virtual server and the double servers is adopted, the two servers respectively take charge of own roles, and the working mode is switched to the working mode of the single server under necessary conditions, so that the operating efficiency of the system is effectively ensured; meanwhile, the contents of data synchronization are clustered and treated respectively based on the memory clusters and the metadata clusters in the database, so that the reasonable distribution of data synchronization resources and the accuracy of data synchronization are effectively guaranteed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a data synchronization method based on a distributed file system according to the present invention.
Fig. 2 is a block diagram illustrating a data synchronization apparatus based on a distributed file system according to the present invention.
Detailed Description
The invention firstly protects a data synchronization method based on a distributed file system, and refers to the attached figure 1, which is a work flow chart of the method, and is characterized in that:
the method comprises the following steps that in the process that a data server carries out communication interaction with a physical server and a virtual server simultaneously in a double-server working mode, the working states of the physical server and the virtual server are monitored, wherein the communication interaction comprises the following steps: performing signal interaction with the physical server, and performing data interaction with the physical server and the virtual server simultaneously;
establishing a database-based memory cluster and a metadata cluster, storing the memory data of the distributed file system to a distributed database in the cluster, and simultaneously storing the metadata of the distributed database in the cluster for processing operation;
if the data server determines that the physical server fails and the virtual server works normally, the data server sends a single-server working mode switching instruction to the virtual server;
the method comprises the steps that an automatic physical server is connected to a virtual server, the resource hierarchical management is carried out according to virtual machine resources defined by an application logic architecture, the intelligent allocation of the resources is calculated, and the resources are dynamically expanded on line;
the data server receives a first confirmation response returned by the virtual server, and continues to perform the signal interaction and the data interaction with the virtual server in a single-server working mode;
and creating a temporary data block by the data node of the virtual server, writing data into the temporary data block according to the received signal difference, and replacing the file content in the original data node with the content of the temporary data block to realize data synchronization.
Preferably, the process of performing communication interaction between the physical server and the virtual server simultaneously further includes:
the method comprises the steps that jump connection is established with a virtual server, so that the virtual server can know the current state of the virtual server, including the current state, load, updating and other conditions, meanwhile, the virtual server waits for a task request sent by the virtual server, after a client connection request sent by the virtual server is received, the virtual server establishes connection with a client to monitor an operation request of the client, and timely responds;
after the current physical server submits the update from the client, the data cache server synchronizes the update to the physical server, and then submits the update message to the virtual server after the update is completed, and the virtual server controls other data cache servers to synchronize with the physical server.
There are generally two steps in synchronizing small files: the data node position where the small file index is located and the stream of the data file are obtained from the metadata node. Wherein the output stream of data files is obtained at the time of writing and the input stream of data files is obtained at the time of reading. When small files are operated, if the small files accessed continuously belong to the same directory, the information of the small files is the same. The system reduces the interaction with the metadata node by caching the index position information corresponding to the directory and the input stream and the output stream of the data file under the directory at the client, thereby improving the access speed of the file, simultaneously not requiring the metadata node to modify the update mark frequently, and only needing to carry out the update when the update mark is changed. The number of the information cached at the index position of the client is 20 by default, and the user can set the information by himself. The cache uses the LRU eviction policy by default, and no lock mechanism is employed for the cache because there are no multiple threads present at the client.
Because the client caches the information of the index position and the update mark of the small file, the client does not need to frequently access the metadata nodes, the performance of the system is greatly improved, and the problem of the consistency of the indexes in each data node is brought. According to the selection rule, the main data node in the index position mapping table is used as the position for creating the index, and the content of the main data node is up to date. After the client 1 queries the replica data node 1, it changes the update flag in the cache to N, and modifies the entry corresponding to the metadata node mapping table. At this time, the client 2 acquires the information of the mapping table and knows that the node 1 of the replica data updates the flag bit N. At this time, client 1 creates an index so that the update flag of replica data node 1 becomes Y, and client 2 cannot actively know, so it will not see the small file just created by client 1.
Further preferably, the synchronization control node in the data nodes obtains a copy list from a metadata node of the source file system according to a source path input by the client, creates a thread pool, and allocates a source file to each thread according to the copy list, where the copy list is a list of all source files under the source path, including a file name, a size, and a file path of each source file.
And each thread of the synchronous control node acquires the metadata of the source file distributed by each thread from the metadata node of the source file system, and acquires the check code of each data block contained in the source file from the corresponding source data node according to the metadata of the source file.
And each thread of the synchronous control node acquires metadata of a target file corresponding to each source file from a metadata node of the target file system, compares the sizes of the source file and the target file, and applies for creating or deleting a data block of the target file from the metadata node of the target file system according to a comparison result so that the size of the target file is consistent with that of the source file.
And each thread of the synchronous control node acquires the metadata of each target file from the metadata node of the target file system again, and acquires the check codes of all data blocks contained in each target file from the corresponding target data node according to the metadata of each target file.
Each thread of the synchronous control node generates a file check code list according to the metadata of the respective source and target files and the check codes of all the source and target data blocks, wherein the file check code list comprises: the serial number of the data block, the ID of the source data block, the check code of the source data block, the ID of the source data node, the ID of the target data block, the check code of the target data block, the ID of the target data node and whether the target data block is the mark bit of the newly created data block.
The method can select random source data space segmentation, long-span dimension average segmentation, clustering segmentation of each dimension and the like, a cluster of a large distributed file system usually spans a plurality of racks, communication between computers on different racks needs to pass through a switch, and transmission cost is high. In most cases, the bandwidth between two computers in different racks is less than that between two computers in the same rack. The copy strategy of the existing distributed file system is to store copies in two different racks, which can prevent data loss when one rack has a problem, and simultaneously, when data is read, a node which is closest to a client computer and stores source data can be accessed by using a principle of proximity, or the reading time is reduced by using the bandwidth between different racks. Moving the computation to the vicinity of the data storage node is significantly more efficient and less expensive than moving the data to the vicinity of the computation node.
Further, preferably, the distributed file system works by adopting a MapReduce thread, and the MapReduce program creates as many control files as the number of directories in the distributed file system as the number of the input files of the Map function according to the number of created directories given by a user and the number of created small files under each directory.
The Map function of the test mainly utilizes the small files with the designated quantity and size created by the mass small file storage system and the small files under the directory created when the interface for reading the small files by using the mass small file storage system to read the small files during the write test.
The same Reduce function is used for the write test and the read test, the function counts output data of each Map, such as the total size, the total amount, the total running time and the like of a MapReduce program test file, and the data are stored in a distributed file system.
And the result analysis function reads out the result data of the Reduce statistics from the distributed file system, and calculates the speed of writing and reading small files of the distributed file system or the mass small file storage system and the like through a given formula.
After the operation of the MapReduce program is finished each time, the memory occupancy of the massive small file storage system and the distributed file system in the system needs to be recorded.
Preferably, the metadata cluster authorizes a user to access the data cache server cluster, quickly establishes connection between the client and the data server, monitors the state of each data cache server in real time, and allocates the data cache server capable of providing the optimal service to the user according to the state information. Meanwhile, the consistency and stability of the user data in the data cache server cluster are ensured by utilizing a cache consistency strategy; the metadata cluster is controlled by the virtual server and is responsible for data interaction with the client, and the data state of each user is monitored in real time. And meanwhile, the state information is submitted to the virtual server, so that the traceability of the control server to the user data state is ensured. And a heartbeat connection is established between the service quality monitoring server and the transaction control server, and the factors influencing the service quality, such as the available bandwidth of the network, the CPU utilization rate and the like of the service quality monitoring server are transmitted to the virtual server.
In a storage system, a physical file corresponds to a logical representation, which constitutes metadata information. When reading the file, the logical file is read first, then the corresponding data block is taken out from the storage system according to the formed metadata information sequence, and finally the copy of the physical file is restored. The data file stores the data of the small files through a key/value data structure, so that the scale of metadata of massive small files in a distributed file system is reduced, the access speed of the small files is increased (by reducing interaction with metadata nodes), the MapReduce-based data processing is facilitated, and support is provided for distributed computing.
All the small files stored in the same directory by the client are stored in the data file in the directory, wherein the data file is a file in the distributed file system. And meanwhile, generating an index, recording the specific position of the small file in the data file and other related information, handing the index to each data node for maintenance and management, and providing index service for the client by the data node. The metadata node needs to record the data node used to maintain the index of the small file. When a client needs to provide an index service request of a small file in a certain directory to a data node, the position of the data node needs to be acquired from the distributed file system. The client side cache mechanism caches and maintains the data node positions and data file information of the small file indexes, and the times of accessing the metadata nodes are reduced, so that the access speed of the small files is greatly improved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

the method comprises the steps that jump connection is established with a virtual server, so that the virtual server can know the current state of the virtual server, including the current state, load, updating and other conditions, meanwhile, the virtual server waits for a task request sent by the virtual server, after a client connection request sent by the virtual server is received, the virtual server establishes connection with a client to monitor an operation request of the client, and timely responds; after the current physical server submits the update from the client, the data cache server synchronizes the update to the physical server, and then submits the update message to the virtual server after the update is completed, and the virtual server controls other data cache servers to synchronize with the physical server.
CN201811096362.XA2018-09-192018-09-19A kind of method of data synchronization and device based on distributed file systemExpired - Fee RelatedCN109299056B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201811096362.XACN109299056B (en)2018-09-192018-09-19A kind of method of data synchronization and device based on distributed file system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811096362.XACN109299056B (en)2018-09-192018-09-19A kind of method of data synchronization and device based on distributed file system

Publications (2)

Publication NumberPublication Date
CN109299056Atrue CN109299056A (en)2019-02-01
CN109299056B CN109299056B (en)2019-10-01

Family

ID=65163433

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811096362.XAExpired - Fee RelatedCN109299056B (en)2018-09-192018-09-19A kind of method of data synchronization and device based on distributed file system

Country Status (1)

CountryLink
CN (1)CN109299056B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112187916A (en)*2020-09-272021-01-05中国银联股份有限公司Cross-system data synchronization method and device
CN113672579A (en)*2021-08-292021-11-19中盾创新档案管理(北京)有限公司File synchronization method based on webservice
CN113901141A (en)*2021-10-112022-01-07京信数据科技有限公司Distributed data synchronization method and system
CN114048269A (en)*2022-01-122022-02-15北京奥星贝斯科技有限公司Method and device for synchronously updating metadata in distributed database
CN114328421A (en)*2022-03-172022-04-12联想凌拓科技有限公司Metadata service architecture management method, computer system, electronic device and medium
CN114647559A (en)*2022-03-212022-06-21北京百度网讯科技有限公司 A statistical method, device, electronic device and storage medium for storage usage
CN115454948A (en)*2022-10-202022-12-09厦门市美亚柏科信息股份有限公司Metadata-based data interaction method and system
CN117312264A (en)*2023-12-012023-12-29中孚信息股份有限公司File synchronization method, system, equipment and medium in virtual disk system

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103268252A (en)*2013-05-122013-08-28南京载玄信息科技有限公司Virtualization platform system based on distributed storage and achieving method thereof
CN103761162A (en)*2014-01-112014-04-30深圳清华大学研究院Data backup method of distributed file system
CN104113587A (en)*2014-06-232014-10-22华中科技大学Client metadata buffer optimization method of distributed file system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103268252A (en)*2013-05-122013-08-28南京载玄信息科技有限公司Virtualization platform system based on distributed storage and achieving method thereof
CN103761162A (en)*2014-01-112014-04-30深圳清华大学研究院Data backup method of distributed file system
CN104113587A (en)*2014-06-232014-10-22华中科技大学Client metadata buffer optimization method of distributed file system

Cited By (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112187916A (en)*2020-09-272021-01-05中国银联股份有限公司Cross-system data synchronization method and device
CN112187916B (en)*2020-09-272023-12-05中国银联股份有限公司 A cross-system data synchronization method and device
CN113672579A (en)*2021-08-292021-11-19中盾创新档案管理(北京)有限公司File synchronization method based on webservice
CN113672579B (en)*2021-08-292023-11-14中盾创新数字科技(北京)有限公司File synchronization method based on webservice
CN113901141A (en)*2021-10-112022-01-07京信数据科技有限公司Distributed data synchronization method and system
CN114048269A (en)*2022-01-122022-02-15北京奥星贝斯科技有限公司Method and device for synchronously updating metadata in distributed database
CN114328421A (en)*2022-03-172022-04-12联想凌拓科技有限公司Metadata service architecture management method, computer system, electronic device and medium
CN114647559A (en)*2022-03-212022-06-21北京百度网讯科技有限公司 A statistical method, device, electronic device and storage medium for storage usage
CN114647559B (en)*2022-03-212025-04-15北京百度网讯科技有限公司 A method, device, electronic device and storage medium for counting storage usage
CN115454948A (en)*2022-10-202022-12-09厦门市美亚柏科信息股份有限公司Metadata-based data interaction method and system
CN117312264A (en)*2023-12-012023-12-29中孚信息股份有限公司File synchronization method, system, equipment and medium in virtual disk system
CN117312264B (en)*2023-12-012024-02-20中孚信息股份有限公司File synchronization method, system, equipment and medium in virtual disk system

Also Published As

Publication numberPublication date
CN109299056B (en)2019-10-01

Similar Documents

PublicationPublication DateTitle
CN109299056B (en)A kind of method of data synchronization and device based on distributed file system
US11068395B2 (en)Cached volumes at storage gateways
US8589361B2 (en)Reduced disk space standby
Dong et al.An optimized approach for storing and accessing small files on cloud storage
KR101771246B1 (en)System-wide checkpoint avoidance for distributed database systems
KR101833114B1 (en)Fast crash recovery for distributed database systems
US9489443B1 (en)Scheduling of splits and moves of database partitions
US11561930B2 (en)Independent evictions from datastore accelerator fleet nodes
US9274956B1 (en)Intelligent cache eviction at storage gateways
US9559889B1 (en)Cache population optimization for storage gateways
US10534776B2 (en)Proximity grids for an in-memory data grid
CN105183839A (en)Hadoop-based storage optimizing method for small file hierachical indexing
US10909143B1 (en)Shared pages for database copies
WO2013134105A1 (en)Virtualized data storage system architecture using prefetching agent
CN118426713B (en)Cluster file distributed management method and system
CN105005611B (en)A kind of file management system and file management method
US11341163B1 (en)Multi-level replication filtering for a distributed database
US11080207B2 (en)Caching framework for big-data engines in the cloud
Shen et al.Ditto: An elastic and adaptive memory-disaggregated caching system
US11288237B2 (en)Distributed file system with thin arbiter node
US10387384B1 (en)Method and system for semantic metadata compression in a two-tier storage system using copy-on-write
CN107203639A (en)Parallel file system based on High Performance Computing
CN118860993B (en)Metadata processing method and device based on distributed storage system
US10628391B1 (en)Method and system for reducing metadata overhead in a two-tier storage architecture
US11698914B1 (en)Serverless managed bulk import on a global NoSQL database with selective back pressure

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CF01Termination of patent right due to non-payment of annual fee
CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20191001

Termination date:20200919


[8]ページ先頭

©2009-2025 Movatter.jp