Movatterモバイル変換


[0]ホーム

URL:


CN107861686B - File storage method, server and computer readable storage medium - Google Patents

File storage method, server and computer readable storage medium
Download PDF

Info

Publication number
CN107861686B
CN107861686BCN201710885384.3ACN201710885384ACN107861686BCN 107861686 BCN107861686 BCN 107861686BCN 201710885384 ACN201710885384 ACN 201710885384ACN 107861686 BCN107861686 BCN 107861686B
Authority
CN
China
Prior art keywords
file
storage
server
distributed
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710885384.3A
Other languages
Chinese (zh)
Other versions
CN107861686A (en
Inventor
卢道和
陈晓峰
杨军
钱碧伟
黎君
胡思文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co LtdfiledCriticalWeBank Co Ltd
Priority to CN201710885384.3ApriorityCriticalpatent/CN107861686B/en
Publication of CN107861686ApublicationCriticalpatent/CN107861686A/en
Application grantedgrantedCritical
Publication of CN107861686BpublicationCriticalpatent/CN107861686B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种文件存储方法,应用于互联网数据中心,所述互联网数据中心包括文件处理系统、分布式文件系统、分布式存储系统,文件处理系统包括服务端和客户端,该方法包括:服务端通过客户端接收发送方上传的文件;将接收到的文件缓存至临时文件夹中,并在分布式存储系统中记录各个文件的存储位置信息;对临时文件夹中的各个文件进行合并处理,得到合并后的文件,并将合并后的文件存储到分布式文件系统中;基于合并后的文件,更新分布式存储系统中对应的存储位置信息。本发明还公开了一种服务端和计算机可读存储介质。本发明通过文件的合并可以提高数据存储的数量,并通过分布式文件系统存储文件,使得可存储的文件数量更多。

Figure 201710885384

The invention discloses a file storage method, which is applied to an Internet data center. The Internet data center includes a file processing system, a distributed file system, and a distributed storage system. The file processing system includes a server and a client. The method includes: The server receives the files uploaded by the sender through the client; caches the received files in a temporary folder, and records the storage location information of each file in the distributed storage system; merges the files in the temporary folder , obtain the merged file, and store the merged file in the distributed file system; based on the merged file, update the corresponding storage location information in the distributed storage system. The invention also discloses a server and a computer-readable storage medium. The present invention can increase the number of data storage by merging files, and store files through a distributed file system, so that the number of files that can be stored is more.

Figure 201710885384

Description

File storage method, server and computer readable storage medium
Technical Field
The present invention relates to the field of application technologies, and in particular, to a file storage method, a server, and a computer-readable storage medium.
Background
Traditional mass file Storage is implemented by using a special file server such as NAS (Network Attached Storage), which is defined as a special dedicated data Storage server including a Storage device (e.g. a disk array, a CD/DVD drive, a tape drive or a removable Storage medium) and embedded system software, and can provide a cross-platform file sharing function. NAS usually possesses its own node on a LAN (Local Area Network), and allows users to access data on the Network without intervention of an application server.
In the existing File storage architecture, a plurality of front-end servers share a back-end NAS device through a dedicated storage Network, and a storage space on the back-end NAS device is shared to a front-end host through CIFS (common Internet File System) and NFS (Network File System) protocols, so that the same directory or File can be concurrently read and written. The file system is located in the back-end storage system, and the connection adopts a standard ethernet link and a TCP (Transmission Control Protocol)/IP (Internet Protocol) Protocol, so that file storage sharing among multiple systems can be realized. However, as the data size is getting larger and the storage capacity of the NAS device is limited with the development of time and business, the conventional file storage mode has difficulty in dealing with the blowout-type development of data, that is, under the condition that the data size is getting larger and larger, the capacity of storing data in the existing file storage mode is difficult to meet the requirement.
Disclosure of Invention
The invention mainly aims to provide a file storage method, a server and a computer readable storage medium, and aims to solve the technical problem that the storage requirement is difficult to meet under the condition that the data capacity is increased in the existing file storage mode.
In order to achieve the above object, the present invention provides a file storage method applied to an internet data center, where the internet data center includes a file processing system, a distributed file system, and a distributed storage system, the file processing system includes a server and a client, and the file storage method includes:
a server of the file processing system receives a file uploaded by a sender through a client;
caching the received files into a temporary folder, and recording storage position information of each file in a distributed storage system;
merging the files in the temporary folder to obtain merged files, and storing the merged files in a distributed file system;
and updating the corresponding storage position information in the distributed storage system based on the merged file.
Optionally, the step of merging the files in the temporary folder to obtain a merged file includes:
the server side scans each file in the temporary folder;
and acquiring a combined file, determining a file with a capacity value smaller than a preset threshold value after being combined with the combined file in the scanned files, and combining the determined file into the combined file.
Optionally, the method further comprises:
when a file query instruction is received, determining index information corresponding to the file query instruction;
searching a merged file pointed by the index information in a distributed file system;
and restoring the merged file to restore the file corresponding to the index information from the merged file.
Optionally, after the step of storing the merged file in the distributed file system, the method further includes:
the server generates file identification information and file hash information based on files stored in the distributed file system;
feeding back file identification information and file hash information to the sender through the client, so that the sender can transmit the file identification information and the file hash information to a receiver;
and when the client receives the file identification information sent by the receiver, extracting the file corresponding to the file identification information from the distributed file system, and feeding the file back to the receiver so that the receiver can check the file through the file hash information, and acquiring the file when the check is successful.
Optionally, the number of the server sides includes a plurality of server sides, the server side of the file processing system is connected with the client side through a gateway, and the mode of uploading the file from the client side to the server side includes: and the gateway polls and uploads the file uploaded by the client to the server according to a preset strategy.
Optionally, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, the method further includes:
the server side scans each file in the distributed file system to monitor the storage duration of each file;
and deleting the files in the distributed file system and deleting the storage position information of the files in the distributed storage system when the storage duration of the files reaches the preset duration.
Optionally, the internet data center further includes a distributed application coordination service, and before the step of scanning, by the server, each file in the distributed file system to monitor the storage duration of each file, the method further includes:
the server side sends request information for deleting the lock to the distributed application program coordination service;
and when the lock is successfully acquired, scanning each file in the distributed file system to monitor the storage time of each file.
Optionally, the server is located in a primary internet data center, and after the step of updating, based on the merged file, the corresponding storage location information in the distributed storage system, when a backup internet data center exists in the system, the method includes:
and the server synchronizes the stored file to the server of the file processing system where the standby internet data center is located so that the server of the file processing system where the standby internet data center is located executes the file storage operation.
In addition, in order to achieve the above object, the present invention further provides a server, where the server includes a memory, a processor, and a file storage program stored in the memory and executable on the processor, and the file storage program, when executed by the processor, implements the steps of the file storage method as described above.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a file storage program which, when executed by a processor, implements the steps of the file storage method as described above.
According to the technical scheme provided by the invention, a server of the file processing system firstly receives the file uploaded by a sender through a client, then caches the received file in a temporary folder, records the storage position information of each file in the distributed storage system, then combines the files in the temporary folder to obtain the combined file, stores the combined file in the distributed file system, and finally updates the corresponding storage position information in the distributed storage system based on the combined file, so that the file can be conveniently read subsequently according to the storage position information. In the scheme, the received files are combined, the combined files are stored in the distributed file system, the amount of the files which can be stored in the system is increased due to the combination of the files, in addition, the distributed file system has expandability, the files are stored through the distributed file system, the amount of the files which can be stored is more, and compared with the existing file storage mode, the file storage method is larger in the scheme and more suitable for storing a large number of small files.
Drawings
FIG. 1 is a schematic diagram of a server-side architecture of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a file storage method according to a first embodiment of the present invention;
FIG. 3 is a diagram of a file storage architecture of the present invention;
FIG. 4 is a schematic illustration of file merging according to the present invention;
FIG. 5 is a flowchart illustrating a second embodiment of a file storage method according to the present invention;
FIG. 6 is an exemplary diagram of a file transfer according to the present invention;
FIG. 7 is a flowchart illustrating a file storage method according to a third embodiment of the present invention;
FIG. 8 is a diagram illustrating deletion of a file according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The solution of the embodiment of the invention is mainly as follows: the server side of the file processing system firstly receives the file uploaded by the sender through the client side, then caches the received file into the temporary folder, records the storage position information of each file in the distributed storage system, then combines the files in the temporary folder to obtain the combined file, stores the combined file into the distributed file system, and finally updates the corresponding storage position information in the distributed storage system based on the combined file, so that the file can be conveniently read subsequently according to the storage position information. The problem that storage requirements are difficult to meet in an existing file storage mode is solved.
The existing file storage method also has the following defects:
the file storage scheme has no life cycle management function, does not support functions of temporary file overdue deletion and the like, and easily causes excessive data storage;
the method is not suitable for accessing a large number of systems, and the installation and the deployment are relatively troublesome.
Based on the problems in the prior art, the invention builds an FPS (File processing System), the FPS can support the storage of mass data, and meanwhile, the high availability of the service is ensured by adopting a scheme of storing multiple copies of data across a machine frame and a machine room. The main application scenarios include:
(1) providing an intermediate platform for file exchange among different systems, for example, providing a reconciliation file for a system B through the intermediate platform by the system A for reconciliation;
(2) the data storage platform based on the file life cycle management is provided, mass data storage can be supported, and files are stored for a period of time and need to be automatically deleted when due.
Description of the invention with terms of art:
hadoop: the distributed computing platform is a distributed system infrastructure, and can enable a user to construct and use the distributed computing platform, and the user can develop and run an application program for processing mass data on Hadoop.
HDFS (Hadoop distributed File System): distributed File System (Hadoop Distributed File System). HDFS is characterized by high fault tolerance and is designed for deployment on inexpensive (low-cost) hardware; and it provides high throughput (high throughput) to access data of applications, suitable for applications with very large data sets.
HBase: the distributed storage system is high in reliability, high in performance, nematic and telescopic, and a large-scale structured storage cluster can be built on a low-cost PC Server by utilizing the HBase technology. Belonging to Hadoop ecosphere.
Zookeeper: the distributed application program coordination service is distributed and open source code, is an open source implementation of Chubby of Google, and is an important component of Hadoop and Hbase. It is a software that provides a consistent service for distributed applications, and the functions provided include: configuration maintenance, domain name service, distributed synchronization, group service, etc. Belonging to Hadoop ecosphere.
TGW: the complete Tencent GateWay is a system for realizing multi-network unified access, forwarding of external network requests and supporting automatic load balancing, and the TGW can be called as a GateWay.
NAS: a Network Attached Storage (Network Attached Storage) is a device connected to a Network and having a data Storage function, and is also called a "Network Storage". It is a dedicated data storage server.
RMB: and the message bus system is used for RPC message service among the multiple systems.
As shown in fig. 1, fig. 1 is a schematic diagram of a server-side structure of a hardware operating environment according to an embodiment of the present invention.
The server in the embodiment of the present invention may be a Personal Computer (PC), a tablet computer, a portable computer, or other terminal equipment with a display function.
As shown in fig. 1, the server may include: aprocessor 1001, such as a CPU, acommunication bus 1002, auser interface 1003, anetwork interface 1004, and amemory 1005. Wherein acommunication bus 1002 is used to enable connective communication between these components. Theuser interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and theoptional user interface 1003 may also include a standard wired interface (e.g., for connecting a wired Keyboard, a wired mouse, etc.), a wireless interface (e.g., for connecting a wireless Keyboard, a wireless mouse). Thenetwork interface 1004 may optionally include a standard wired interface (for connecting to a wired network), a wireless interface (e.g., a WI-FI interface, a bluetooth interface, an infrared interface, etc., for connecting to a wireless network). Thememory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). Thememory 1005 may alternatively be a storage device separate from theprocessor 1001.
Optionally, the server may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like.
Those skilled in the art will appreciate that the server side architecture shown in FIG. 1 does not constitute a limitation of the server side, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, amemory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a file storage program. The operating system is a program for managing and controlling the server and software resources, and supports the operation of a network communication module, a user interface module, a file storage program and other programs or software; the network communication module is used for managing and controlling thenetwork interface 1002; the user interface module is used to manage and control theuser interface 1003.
In the server shown in fig. 1, thenetwork interface 1004 is mainly used for connecting to a server of a standby internet data center and performing data communication with the server of the standby internet data center; theuser interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; the server side calls a file storage program stored in thememory 1005 through theprocessor 1001, and executes the following steps:
receiving a file uploaded by a sender through a client;
caching the received files into a temporary folder, and recording storage position information of each file in a distributed storage system;
merging the files in the temporary folder to obtain merged files, and storing the merged files in a distributed file system;
and updating the corresponding storage position information in the distributed storage system based on the merged file.
Further, the server calls a file storage program stored in thememory 1005 through theprocessor 1001 to implement merging processing on each file in the temporary folder, so as to obtain a merged file:
scanning each file in the temporary folder;
and acquiring a combined file, determining a file with a capacity value smaller than a preset threshold value after being combined with the combined file in the scanned files, and combining the determined file into the combined file.
Further, the server calls a file storage program stored in thememory 1005 through theprocessor 1001 to implement the following steps:
when a file query instruction is received, determining index information corresponding to the file query instruction;
searching a merged file pointed by the index information in a distributed file system;
and restoring the merged file to restore the file corresponding to the index information from the merged file.
Further, after the step of storing the merged file in the distributed file system, the server calls, through theprocessor 1001, a file storage program stored in thememory 1005, so as to implement the following steps:
generating file identification information and file hash information based on files stored in a distributed file system;
feeding back file identification information and file hash information to the sender through the client, so that the sender can transmit the file identification information and the file hash information to a receiver;
and when the client receives the file identification information sent by the receiver, extracting the file corresponding to the file identification information from the distributed file system, and feeding the file back to the receiver so that the receiver can check the file through the file hash information, and acquiring the file when the check is successful.
Further, the number of the server sides includes a plurality of, the server side of the file processing system is connected with the client side through a gateway, and the mode of uploading the file from the client side to the server side includes: and the gateway polls and uploads the file uploaded by the client to the server according to a preset strategy.
Further, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, the server calls, through theprocessor 1001, the file storage program stored in thememory 1005, so as to implement the following steps:
scanning each file in the distributed file system to monitor the storage duration of each file;
and deleting the files in the distributed file system and deleting the storage position information of the files in the distributed storage system when the storage duration of the files reaches the preset duration.
Further, the internet data center further includes a distributed application program coordination service, and before the step of scanning each file in the distributed file system by the server to monitor the storage duration of each file, the server calls a file storage program stored in thememory 1005 through theprocessor 1001 to implement the following steps:
sending request information for deleting the lock to the distributed application program coordination service;
and when the lock is successfully acquired, scanning each file in the distributed file system to monitor the storage time of each file.
Further, the server is located in the primary internet data center, and after the step of updating the corresponding storage location information in the distributed storage system based on the merged file in the case that the backup internet data center exists in the system, the server calls the file storage program stored in thememory 1005 through theprocessor 1001 to implement the following steps:
and synchronizing the stored files to a server of a file processing system where the standby internet data center is located so that the server of the file processing system where the standby internet data center is located executes file storage operation.
Based on the hardware structure of the server, the file storage method provided by the invention has various embodiments.
Referring to fig. 2, fig. 2 is a flowchart illustrating a file storage method according to a first embodiment of the present invention.
In this embodiment, the file storage method is applied to an internet data center, where the internet data center includes a file processing system, a distributed file system, and a distributed storage system, the file processing system includes a server and a client, and the file storage method includes:
a server of the file processing system receives a file uploaded by a sender through a client; caching the received files into a temporary folder, and recording storage position information of each file in a distributed storage system; merging the files in the temporary folder to obtain merged files, and storing the merged files in a distributed file system; and updating the corresponding storage position information in the distributed storage system based on the merged file.
In this embodiment, the file storage method is applied to a server corresponding to a file processing system FPS where an IDC (Internet Data Center) is located, and the server may be the server shown in fig. 2.
It should be noted that, in the embodiment of the present invention, an FPS (File processing System) is provided, and the FPS is an independently developed small File storage processing System, and has functions of specific life cycle management, cross-machine room disaster recovery, and the like.
In the implementation of the present invention, the structure diagram of the IDC may refer to fig. 3:
the IDC comprises a file processing system FPS, a distributed file system HDFS and a distributed storage system Hbase, wherein the FPS mainly comprises two parts: FPS-Client and FPS-Server. The business program uses the functions provided by the FPS by integrating the FPS-Client.
FPS-Client: and the FPS provides clients with java version, c language version and python version externally.
FPS-Server: the background server program of the FPS provides main service logic processing, including functions of authority control, file storage and reading, file life cycle management and the like.
As can be seen from fig. 3, the server and the client of the file processing system are connected through a gateway, the gateway is represented by a TGW, and the TGW (tencent gateway) is used for load balancing when forwarding files.
It should be noted that, the number of the servers and the clients of the FPS is not limited, and the number may be set according to the actual situation, and when the number of the servers includes a plurality of servers, the manner of uploading the file from the client to the server includes: and the gateway polls and uploads the file uploaded by the client to the server according to a preset strategy.
That is, all traffic that is uploaded to the FPS-Server by the gateway, that is, the FPS-Client, passes through the TGW, and the file polling is sent to each FPS-Server through the TGW. After receiving the file, the FPS-Server stores the content of the file in an HDFS (distributed file system) and records the storage position information of the file in an HBASE (distributed storage system).
It should be noted that, the HDFS is a distributed file system, and a PC is used as storage, and a general file has multiple backups, such as three backups, so that a machine can be dynamically and linearly added conveniently, and in the face of exponential growth of internet services, the non-stop capacity expansion can be conveniently achieved, and the service requirements can be well met. Meanwhile, the storage position information of the file is stored in HBase, so that the billions of file indexes can be stored, and the HBase is the same as HDFS and belongs to the members of the Hadoop ecosphere, and the storage performance can be conveniently improved by adding nodes.
In the embodiment of the invention, the interaction between the FPS-Client and the FPS-Server is carried out by an HTTP (Hyper Text Transfer Protocol).
The following are specific steps of implementing file storage in this embodiment:
step S10, the server of the file processing system receives the file uploaded by the sender through the client;
namely, the server receives the file uploaded by the client through the TGW.
Step S20, caching the received files into a temporary folder, and recording the storage position information of each file in a distributed storage system;
in this embodiment, as shown in fig. 3, the FPS-Server receives a file sent by a sender through the FPS-Client, after the FPS-Server receives the file, the received file is firstly cached in a temporary folder, where the temporary folder is preferably an assigned directory of the HDFS, and after the FPS-Server caches the file in the temporary folder, the FPS-Server records storage location information of each file in the distributed storage file.
Step S30, merging each file in the temporary folder to obtain a merged file, and storing the merged file in the distributed file system;
specifically, the step of "merging each file in the temporary folder to obtain a merged file" includes:
step 1, the server scans the directory in the temporary folder to obtain the lock corresponding to the directory;
step 2, under the condition of acquiring the lock, the server scans each file in the temporary folder;
and step 3, acquiring the combined file, determining the file which is combined with the combined file and has the capacity value smaller than a preset threshold value from the scanned files, and combining the determined file into the combined file.
Further, after step 3, the method further comprises:
and 4, deleting the combined files in the temporary folder.
That is, after the uploaded file is cached in the temporary folder, the FPS-Server scans the directory in the temporary folder to obtain a lock corresponding to the directory, and if the lock can be obtained, the FPS-Server scans the capacity value of each file in the temporary folder, and preferably performs merging processing on the file whose capacity value merged with the combined file is smaller than a preset threshold, in this embodiment, the preset threshold is set according to an actual situation, and this is not limited here. The merging mode of the files is as follows: acquiring a preset combined file, scanning each file in the temporary folder, determining files with the capacity value smaller than a preset threshold value after being combined with the combined file in the scanned files, combining the determined files into the combined file, storing the combined file into a distributed file system (HDFS) after the files are combined, and updating index information corresponding to the combined files in the distributed storage system according to the combined files.
In the embodiment of the present invention, each file in the temporary folder may be periodically scanned, or each file in the folder may be scanned in real time, then the scanned small files are merged, and the storage location information after merging of the small files is updated in the HBase. The small files are merged by the FPS-Server, so that the files subsequently stored in the distributed file system are not scattered, the space occupied by the stored files can be reduced, and the memory space of cluster nodes can be obviously saved.
For better understanding of the embodiment, referring to fig. 4, first, the FPS-Server scans the temporary directory of the HDFS, then obtains a lock corresponding to the directory, scans each file under the directory in the case of obtaining the lock, then obtains index information of the small files in the distributed storage system, then obtains a combined file, merges the small files into the combined file, and finally updates the index information and deletes the original small files in the distributed file systems.
Step S40, based on the merged file, updating the corresponding storage location information in the distributed storage system.
After the merged file is obtained, the FPS-Server updates the distributed storage system based on the merged file to update the storage location information corresponding to the merged file in the distributed storage system, that is, the FPS-Server updates the storage location information of the merged file in the distributed storage system to the distributed storage system, so that when the file is subsequently searched, the FPS-Server indexes the corresponding information according to the storage location information.
In addition, in the embodiment of the present invention, the method further includes:
step A, when a file query instruction is received, determining index information corresponding to the file query instruction;
b, searching a merged file pointed by the index information in a distributed file system;
and C, restoring the merged file to restore the file corresponding to the index information from the merged file.
According to the technical scheme provided by the embodiment, a server of the file processing system receives a file uploaded by a sender through a client, the received file is cached in a temporary folder, the storage location information of each file is recorded in the distributed storage system, then each file in the temporary folder is merged to obtain a merged file, the merged file is stored in the distributed file system, and finally the corresponding storage location information in the distributed storage system is updated based on the merged file, so that the file can be conveniently read subsequently according to the storage location information. In the scheme, the received files are combined, the combined files are stored in the distributed file system, the amount of the files which can be stored in the system is increased due to the combination of the files, in addition, the distributed file system has expandability, the files are stored through the distributed file system, the amount of the files which can be stored is more, and compared with the existing file storage mode, the file storage method is larger in the scheme and more suitable for storing a large number of small files.
Further, referring to fig. 5, a second embodiment of the file storing method of the present invention is proposed based on the first embodiment.
The second embodiment of the file storage method differs from the first embodiment of the file storage method in that, after the step S30, the method further includes:
step S50, the server generates file identification information and file hash information based on the files stored in the distributed file system;
step S60, feeding back file identification information and file hash information to the sender through the client, so that the sender can transmit the file identification information and the file hash information to a receiver;
step S70, when receiving the file identification information sent by the receiver through the client, extracting the file corresponding to the file identification information in the distributed file system, and feeding back the file to the receiver, so that the receiver can check the file through the file hash information, and obtain the file when the check is successful.
In this embodiment, after the FPS-Server stores a File in the HDFS, the FPS-Server generates File identification information (File Id) and File Hash information (File Hash) corresponding to the File according to the File stored in the HDFS, and after obtaining the File Id and the File Hash, the FPS-Server feeds back the File Id and the File Hash to the sender, so that the sender transmits the File Id and the File Hash to the receiver.
It should be noted that the sender and the receiver interact with each other through the RMB message service bus. And after receiving the File Id and the File Hash, the receiving party sends the File Id to the FPS-Server by using the FPS-Client.
When the FPS-Server receives the File Id sent by the receiver through the FPS-Client, the File corresponding to the File Id is extracted from the HDFS and fed back to the receiver, so that the receiver can check the File through the File Hash and acquire the File when the check is successful. That is, the receiver downloads the File from the FPS-Server through the File Id, checks the accuracy of the File through the File Hash, and completes the downloading and the File accuracy check in the FPS-Client.
For better understanding of the embodiment, referring to fig. 6, the sender uploads the File to the FPS, and after the File is successfully uploaded, the FPS returns the File Id and the File Hash of the File to the sender, and after the sender receives the File Id and the File Hash, the sender sends the File Id and the File Hash to the receiver through the RMB message service bus.
In the embodiment, file transmission between the sender and the receiver is realized through each system in the internet data center, and the file identification information and the file hash information are used for verifying the file, so that the accuracy of file transmission is improved.
Further, referring to fig. 7, a third embodiment of the file storing method of the present invention is proposed based on the first embodiment.
The third embodiment of the file storage method differs from the first embodiment of the file storage method in that, after the step S40, the method includes:
step S80, the server scans each file in the distributed file system to monitor the storage duration of each file;
and step S90, deleting the files in the distributed file system and deleting the storage position information of the files in the distributed storage system when the storage duration of the files reaches a preset duration.
In this embodiment, the manner in which the server scans each file in the distributed file system is preferably a timed scan. Therefore, after the distributed file system HDFS stores files, the FPS-Server background periodically performs the following operations: files that have expired are periodically scanned and deleted to save disk space. Specifically, the method comprises the following steps: the FPS-Server scans each file in the HDFS to monitor the storage time of each file, and when the storage time of a file reaches a preset time, the preset time is set according to the actual condition without limitation, and if the preset time is 3 months. When the storage duration of the file reaches the preset duration, the storage duration of the file is longer, and in order to realize the life cycle management of file storage, the file in the distributed file system is deleted, and the storage position information of the file in the distributed storage system is deleted.
In this embodiment, the internet data center further includes a distributed application coordination service, and before the step S80, the method further includes:
the server side sends request information for deleting the lock to the distributed application program coordination service;
and when the lock is successfully acquired, scanning each file in the distributed file system to monitor the storage time of each file.
As shown in FIG. 3, the application coordination service is represented by Zookeeper. Before deleting the file, the FPS-Server sends request information for acquiring a delete lock to the Zookeeper, and if the lock can be successfully acquired, the step S80 is executed.
For better understanding of the present invention, referring to fig. 8, the FPS-Server sends request information for acquiring a delete lock to the Zookeeper at regular time, if the lock can be acquired, that is, the FPS-Server successfully acquires the lock, at this time, the FPS-Server requests outdated data, that is, storage location information corresponding to a file whose storage duration from the hbsase reaches a preset duration, to the distributed storage system Hbase, and deletes the requested storage location information after the request. Subsequently, the FPS-Server requests the Zookeeper to acquire a deletion lock, and deletes the file with the storage duration reaching the preset duration in the HDFS after the lock is acquired.
In the embodiment, due to the fact that the expired data is deleted at regular time, the file storage has a life cycle, expired files can be deleted at regular time, the file volume is prevented from being too large, and the intelligence of the file storage is improved.
Further, a fourth embodiment of the file storing method of the present invention is proposed based on the first embodiment.
The fourth embodiment of the file storage method is different from the first to third embodiments of the file storage method in that the server is located in a primary internet data center, and in a case where a backup internet data center exists in the system, after the step S40, the method includes:
and D, synchronizing the stored files to a server of the file processing system where the standby internet data center is located by the server so that the server of the file processing system where the standby internet data center is located can execute file storage operation.
In this embodiment, the deployed internet data center IDC includes multiple sets, for example, two sets of internet data center IDCs, namely a main IDC and a passive IDC, and the two IDCs are in network communication with each other. The business system is connected to the TGW request uploading file through the integrated FPS-Client, the TGW routes the request to a certain FPS-Server in the main IDC according to a specified strategy, and then the file uploading is formally started. The FPS-Server temporarily stores the file in the file, and the position information of the file is stored in HBASE.
When the file in the main IDC is successfully uploaded, the FPS-Server in the main IDC asynchronously uploads the file to the FPS-Server in the standby IDC, so that the consistency of the files in the two clusters is ensured, and the main and standby FPS synchronously adopt logic backup.
In the embodiment, through the backup of the files, the backup IDCs can continue to provide services under the condition that the main IDCs have faults, the storage and the use of the files are not influenced, and the availability is higher.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a file storage program is stored on the computer-readable storage medium, and when executed by a processor, the file storage program implements the following operations:
receiving a file uploaded by a sender through a client;
caching the received files into a temporary folder, and recording storage position information of each file in a distributed storage system;
merging the files in the temporary folder to obtain merged files, and storing the merged files in a distributed file system;
and updating the corresponding storage position information in the distributed storage system based on the merged file.
Further, when the file storage program is executed by the processor, the operation of merging the files in the temporary folder to obtain a merged file is also realized:
scanning each file in the temporary folder;
and acquiring a combined file, determining a file with a capacity value smaller than a preset threshold value after being combined with the combined file in the scanned files, and combining the determined file into the combined file.
Further, when executed by the processor, the file storage program further implements the following operations:
when a file query instruction is received, determining index information corresponding to the file query instruction;
searching a merged file pointed by the index information in a distributed file system;
and restoring the merged file to restore the file corresponding to the index information from the merged file.
Further, after the step of storing the merged file in the distributed file system, when the file storage program is executed by the processor, the following operations are also implemented:
generating file identification information and file hash information based on files stored in a distributed file system;
feeding back file identification information and file hash information to the sender through the client, so that the sender can transmit the file identification information and the file hash information to a receiver;
and when the client receives the file identification information sent by the receiver, extracting the file corresponding to the file identification information from the distributed file system, and feeding the file back to the receiver so that the receiver can check the file through the file hash information, and acquiring the file when the check is successful.
Further, the number of the server sides includes a plurality of, the server side of the file processing system is connected with the client side through a gateway, and the mode of uploading the file from the client side to the server side includes: and the gateway polls and uploads the file uploaded by the client to the server according to a preset strategy.
Further, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, when the file storage program is executed by the processor, the following operations are further implemented:
scanning each file in the distributed file system to monitor the storage duration of each file;
and deleting the files in the distributed file system and deleting the storage position information of the files in the distributed storage system when the storage duration of the files reaches the preset duration.
Further, the internet data center further includes a distributed application program coordination service, and before the step of scanning each file in the distributed file system by the server to monitor the storage duration of each file, when the file storage program is executed by the processor, the following operations are also implemented:
sending request information for deleting the lock to the distributed application program coordination service;
and when the lock is successfully acquired, scanning each file in the distributed file system to monitor the storage time of each file.
Further, the server is located in a primary internet data center, and when the backup internet data center exists in the system, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, the file storage program is executed by the processor, and the following operations are further implemented:
and synchronizing the stored files to a server of a file processing system where the standby internet data center is located so that the server of the file processing system where the standby internet data center is located executes file storage operation.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

Translated fromChinese
1.一种文件存储方法,其特征在于,应用于互联网数据中心,所述互联网数据中心包括文件处理系统、分布式文件系统、分布式存储系统,所述文件处理系统包括服务端和客户端,所述文件存储方法包括:1. a file storage method, is characterized in that, is applied to Internet data center, described Internet data center comprises file processing system, distributed file system, distributed storage system, and described file processing system comprises server and client, The file storage method includes:文件处理系统的服务端通过客户端接收发送方上传的文件;The server of the file processing system receives the file uploaded by the sender through the client;将接收到的文件缓存至临时文件夹中,并在分布式存储系统中记录各个文件的存储位置信息;Cache the received files in a temporary folder, and record the storage location information of each file in the distributed storage system;对临时文件夹中的各个文件进行合并处理,得到合并后的文件,并将合并后的文件存储到分布式文件系统中,其中,所述服务端扫描所述临时文件夹中的目录,以获取所述目录对应的锁,根据所述锁扫描所述临时文件夹中的各文件,获取预设的组合文件,在扫描的各文件中确定与组合文件合并后容量值小于预设阈值的文件,将确定的文件合并到所述组合文件中,以获取合并后的文件;Merge each file in the temporary folder to obtain the merged file, and store the merged file in the distributed file system, wherein the server scans the directory in the temporary folder to obtain For the lock corresponding to the directory, each file in the temporary folder is scanned according to the lock, a preset combined file is obtained, and a file whose capacity value is less than a preset threshold after being merged with the combined file is determined in each scanned file, Merging the determined files into the combined file to obtain the merged file;基于合并后的文件,更新分布式存储系统中对应的存储位置信息。Based on the merged file, the corresponding storage location information in the distributed storage system is updated.2.如权利要求1所述的文件存储方法,其特征在于,所述方法还包括:2. The file storage method according to claim 1, wherein the method further comprises:在接收到文件查询指令时,确定文件查询指令对应的索引信息;When receiving the file query instruction, determine the index information corresponding to the file query instruction;在分布式文件系统中查找所述索引信息所指向的已合并文件;Find the merged file pointed to by the index information in the distributed file system;对所述已合并文件进行还原,以从已合并文件中还原出所述索引信息对应的文件。The merged file is restored to restore the file corresponding to the index information from the merged file.3.如权利要求1所述的文件存储方法,其特征在于,所述将合并后的文件存储到分布式文件系统中的步骤之后,所述方法还包括:3. The file storage method according to claim 1, wherein after the step of storing the merged file in the distributed file system, the method further comprises:所述服务端基于分布式文件系统中存储的文件,生成文件标识信息及文件哈希信息;The server generates file identification information and file hash information based on the files stored in the distributed file system;通过所述客户端反馈文件标识信息及文件哈希信息至所述发送方,以供所述发送方将文件标识信息及文件哈希信息传输至接收方;Feeding back the file identification information and the file hash information to the sender through the client, so that the sender can transmit the file identification information and the file hash information to the receiver;通过所述客户端接收到所述接收方发送的文件标识信息时,在分布式文件系统中提取所述文件标识信息对应的文件,并反馈至所述接收方,以供所述接收方通过文件哈希信息检验所述文件,并在检验成功时获取所述文件。When receiving the file identification information sent by the receiver through the client, extract the file corresponding to the file identification information in the distributed file system, and feed it back to the receiver, so that the receiver can pass the file The hash information verifies the file, and obtains the file if the verification is successful.4.如权利要求1所述的文件存储方法,其特征在于,所述服务端的个数包括多个,所述文件处理系统的服务端和客户端通过网关连接,文件从客户端上传至服务端的方式包括:网关按照预设的策略,将客户端上传的文件轮询上传至服务端中。4. file storage method as claimed in claim 1, is characterized in that, the number of described server comprises a plurality of, the server of described file processing system and client are connected through gateway, and file is uploaded from client to server's The methods include: the gateway polls and uploads the files uploaded by the client to the server according to a preset policy.5.如权利要求1所述的文件存储方法,其特征在于,所述基于合并后的文件,更新分布式存储系统中对应的存储位置信息的步骤之后,所述方法还包括:5. The file storage method according to claim 1, wherein after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, the method further comprises:所述服务端扫描分布式文件系统中的各个文件,以监测各个文件的存储时长;The server scans each file in the distributed file system to monitor the storage duration of each file;在有文件的存储时长达到预设时长时,删除所述分布式文件系统中的所述文件,并删除所述分布式存储系统中所述文件的存储位置信息。When the storage duration of a file reaches a preset duration, the file in the distributed file system is deleted, and the storage location information of the file in the distributed storage system is deleted.6.如权利要求5所述的文件存储方法,其特征在于,所述互联网数据中心还包括分布式应用程序协调服务,所述服务端扫描分布式文件系统中的各个文件,以监测各个文件的存储时长的步骤之前,所述方法还包括:6. The file storage method according to claim 5, wherein the Internet data center further comprises a distributed application coordination service, and the server scans each file in the distributed file system to monitor the content of each file. Before the step of storing the duration, the method further includes:所述服务端向分布式应用程序协调服务发送删除锁的请求信息;The server sends request information for deleting the lock to the distributed application coordination service;在获取锁成功时,执行扫描分布式文件系统中的各个文件,以监测各个文件的存储时长的步骤。When the lock is acquired successfully, the steps of scanning each file in the distributed file system to monitor the storage time of each file are performed.7.如权利要求1-6任一项所述的文件存储方法,其特征在于,所述服务端位于主互联网数据中心中,在系统中存在备互联网数据中心的情况下,所述基于合并后的文件,更新分布式存储系统中对应的存储位置信息的步骤之后,所述方法包括:7. The file storage method according to any one of claims 1-6, wherein the server is located in the main Internet data center, and in the case of a backup Internet data center in the system, the After the step of updating the corresponding storage location information in the distributed storage system, the method includes:所述服务端将存储的文件同步到备互联网数据中心所在文件处理系统的服务端中,以供备互联网数据中心所在文件处理系统的服务端执行文件存储操作。The server synchronizes the stored files to the server of the file processing system where the backup Internet data center is located, so that the server of the file processing system where the backup Internet data center is located performs the file storage operation.8.一种服务端,其特征在于,所述服务端包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的文件存储程序,所述文件存储程序被所述处理器执行时实现如权利要求1至7中任一项所述的文件存储方法的步骤。8. A server, characterized in that the server comprises a memory, a processor and a file storage program that is stored on the memory and can be run on the processor, and the file storage program is processed by the process The steps of implementing the file storage method according to any one of claims 1 to 7 when the computer is executed.9.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有文件存储程序,所述文件存储程序被处理器执行时实现如权利要求1至7中任一项所述的文件存储方法的步骤。9. A computer-readable storage medium, characterized in that, a file storage program is stored on the computer-readable storage medium, and when the file storage program is executed by a processor, any one of claims 1 to 7 is implemented. The steps of the file storage method described above.
CN201710885384.3A2017-09-262017-09-26File storage method, server and computer readable storage mediumActiveCN107861686B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201710885384.3ACN107861686B (en)2017-09-262017-09-26File storage method, server and computer readable storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201710885384.3ACN107861686B (en)2017-09-262017-09-26File storage method, server and computer readable storage medium

Publications (2)

Publication NumberPublication Date
CN107861686A CN107861686A (en)2018-03-30
CN107861686Btrue CN107861686B (en)2021-01-05

Family

ID=61698675

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201710885384.3AActiveCN107861686B (en)2017-09-262017-09-26File storage method, server and computer readable storage medium

Country Status (1)

CountryLink
CN (1)CN107861686B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US12399868B1 (en)*2023-03-202025-08-26Amazon Technologies, Inc.Managed file compaction for distributed storage systems

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108768948B (en)*2018-04-282021-04-16努比亚技术有限公司Access right management method, server and computer readable storage medium
CN109033345B (en)*2018-07-242022-07-15中国科学技术大学Distributed data archiving parameter configuration method for large scientific device control system
CN109241002B (en)*2018-09-102022-02-01创新先进技术有限公司File uploading method, device and equipment
CN109582684B (en)*2018-11-302021-11-09深圳市乐信信息服务有限公司Local cache data updating method and device, server and storage medium
CN109597578A (en)*2018-12-032019-04-09郑州云海信息技术有限公司A kind of date storage method, system, equipment and computer readable storage medium
CN110012050A (en)*2018-12-042019-07-12阿里巴巴集团控股有限公司Message Processing, storage method, apparatus and system
CN110471884B (en)*2019-08-192024-11-12深圳前海微众银行股份有限公司 Data management method, device, equipment, system and computer-readable storage medium
CN110491478A (en)*2019-08-222019-11-22中电健康云科技有限公司A kind of image file distributed storage system and its implementation based on ceph
CN110795403B (en)*2019-10-312022-03-11北京永亚普信科技有限责任公司File arrival scanning optimization method for polling mechanism
CN110888837B (en)*2019-11-152021-01-22星辰天合(北京)数据科技有限公司Object storage small file merging method and device
CN111444160B (en)*2020-03-312022-06-07南京领行科技股份有限公司Data storage system and uploading and storing method, device, equipment and medium
CN111737052B (en)*2020-06-192023-07-07中国工商银行股份有限公司Distributed object storage system and method
CN112035057B (en)*2020-07-242022-06-21武汉达梦数据库股份有限公司Hive file merging method and device
CN112231293B (en)*2020-09-142024-07-19杭州数梦工场科技有限公司File reading method, device, electronic equipment and storage medium
CN112463191B (en)*2020-11-262024-10-22北京沃东天骏信息技术有限公司File updating method and device, equipment and storage medium
CN112738157A (en)*2020-12-112021-04-30苏州浪潮智能科技有限公司File uploading method, system and equipment
CN112905557B (en)*2021-03-032023-01-24山东兆物网络技术股份有限公司Mass file integration storage method and system supporting asynchronous submission
CN112948327A (en)*2021-04-012021-06-11北京奇艺世纪科技有限公司File processing method, system, electronic device and storage medium
CN114661668A (en)*2022-03-182022-06-24深圳市欢太科技有限公司 File management method and related device
CN115396426A (en)*2022-08-242022-11-25中国银行股份有限公司 A file interaction method and system
CN115348259A (en)*2022-08-302022-11-15中国银行股份有限公司File transmission method, device, equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN201726424U (en)*2009-08-182011-01-26升东网络科技发展(上海)有限公司Distributed storage system
CN103353892A (en)*2013-07-052013-10-16北京东方网信科技股份有限公司Method and system for data cleaning suitable for mass storage
CN104820717A (en)*2015-05-222015-08-05国网智能电网研究院Massive small file storage and management method and system
CN105138571A (en)*2015-07-242015-12-09四川长虹电器股份有限公司Distributed file system and method for storing lots of small files
CN105404652A (en)*2015-10-292016-03-16河海大学Mass small file processing method based on HDFS
CN105468686A (en)*2015-11-172016-04-06北京奇虎科技有限公司Method and device for reducing redundant data
CN105956183A (en)*2016-05-302016-09-21广东电网有限责任公司电力调度控制中心Method and system for multi-stage optimization storage of a lot of small files in distributed database

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN201726424U (en)*2009-08-182011-01-26升东网络科技发展(上海)有限公司Distributed storage system
CN103353892A (en)*2013-07-052013-10-16北京东方网信科技股份有限公司Method and system for data cleaning suitable for mass storage
CN104820717A (en)*2015-05-222015-08-05国网智能电网研究院Massive small file storage and management method and system
CN105138571A (en)*2015-07-242015-12-09四川长虹电器股份有限公司Distributed file system and method for storing lots of small files
CN105404652A (en)*2015-10-292016-03-16河海大学Mass small file processing method based on HDFS
CN105468686A (en)*2015-11-172016-04-06北京奇虎科技有限公司Method and device for reducing redundant data
CN105956183A (en)*2016-05-302016-09-21广东电网有限责任公司电力调度控制中心Method and system for multi-stage optimization storage of a lot of small files in distributed database

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US12399868B1 (en)*2023-03-202025-08-26Amazon Technologies, Inc.Managed file compaction for distributed storage systems

Also Published As

Publication numberPublication date
CN107861686A (en)2018-03-30

Similar Documents

PublicationPublication DateTitle
CN107861686B (en)File storage method, server and computer readable storage medium
US9479567B1 (en)Synchronization protocol for multi-premises hosting of digital content items
US9852147B2 (en)Selective synchronization and distributed content item block caching for multi-premises hosting of digital content items
CN108027828B (en)Managed file synchronization with stateless synchronization nodes
US20180150477A1 (en)System and method for content synchronization
KR101150146B1 (en)System and method for managing cached objects using notification bonds
US20170124170A1 (en)Synchronization protocol for multi-premises hosting of digital content items
US20040143836A1 (en)System and method for sharing objects among two or more electronic devices
US20090240698A1 (en)Computing environment platform
JP2010518490A (en) Synchronization framework for irregularly connected applications
CN102882985A (en)File sharing method based on cloud storage
CN1656480A (en) Method and device allowing synchronization of data in different devices with different capabilities
CN105009121A (en)Predictive storage service
CN107566463A (en)A kind of cloudy storage management system for improving storage availability
TW201405324A (en)Cloud storage system and data storage and sharing method based on the system
US20160323138A1 (en)Scalable Event-Based Notifications
CN104601724A (en)Method and system for uploading and downloading file
US20060259523A1 (en)System and method of synchronization of internal data cache with wireless device application data repositories
CN104935634A (en) Data Sharing Method for Mobile Devices Based on Distributed Shared Storage
US10545667B1 (en)Dynamic data partitioning for stateless request routing
JP2023539945A (en) External location synchronization
US9436769B2 (en)Automatic device upload configuration
CN111596933B (en)File processing method, device, electronic equipment and computer readable storage medium
TWI571754B (en)Method for performing file synchronization control, and associated apparatus
CN119324940A (en)Object pushing method and device and related equipment

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp