Background
Traditional mass file Storage is implemented by using a special file server such as NAS (Network Attached Storage), which is defined as a special dedicated data Storage server including a Storage device (e.g. a disk array, a CD/DVD drive, a tape drive or a removable Storage medium) and embedded system software, and can provide a cross-platform file sharing function. NAS usually possesses its own node on a LAN (Local Area Network), and allows users to access data on the Network without intervention of an application server.
In the existing File storage architecture, a plurality of front-end servers share a back-end NAS device through a dedicated storage Network, and a storage space on the back-end NAS device is shared to a front-end host through CIFS (common Internet File System) and NFS (Network File System) protocols, so that the same directory or File can be concurrently read and written. The file system is located in the back-end storage system, and the connection adopts a standard ethernet link and a TCP (Transmission Control Protocol)/IP (Internet Protocol) Protocol, so that file storage sharing among multiple systems can be realized. However, as the data size is getting larger and the storage capacity of the NAS device is limited with the development of time and business, the conventional file storage mode has difficulty in dealing with the blowout-type development of data, that is, under the condition that the data size is getting larger and larger, the capacity of storing data in the existing file storage mode is difficult to meet the requirement.
Disclosure of Invention
The invention mainly aims to provide a file storage method, a server and a computer readable storage medium, and aims to solve the technical problem that the storage requirement is difficult to meet under the condition that the data capacity is increased in the existing file storage mode.
In order to achieve the above object, the present invention provides a file storage method applied to an internet data center, where the internet data center includes a file processing system, a distributed file system, and a distributed storage system, the file processing system includes a server and a client, and the file storage method includes:
a server of the file processing system receives a file uploaded by a sender through a client;
caching the received files into a temporary folder, and recording storage position information of each file in a distributed storage system;
merging the files in the temporary folder to obtain merged files, and storing the merged files in a distributed file system;
and updating the corresponding storage position information in the distributed storage system based on the merged file.
Optionally, the step of merging the files in the temporary folder to obtain a merged file includes:
the server side scans each file in the temporary folder;
and acquiring a combined file, determining a file with a capacity value smaller than a preset threshold value after being combined with the combined file in the scanned files, and combining the determined file into the combined file.
Optionally, the method further comprises:
when a file query instruction is received, determining index information corresponding to the file query instruction;
searching a merged file pointed by the index information in a distributed file system;
and restoring the merged file to restore the file corresponding to the index information from the merged file.
Optionally, after the step of storing the merged file in the distributed file system, the method further includes:
the server generates file identification information and file hash information based on files stored in the distributed file system;
feeding back file identification information and file hash information to the sender through the client, so that the sender can transmit the file identification information and the file hash information to a receiver;
and when the client receives the file identification information sent by the receiver, extracting the file corresponding to the file identification information from the distributed file system, and feeding the file back to the receiver so that the receiver can check the file through the file hash information, and acquiring the file when the check is successful.
Optionally, the number of the server sides includes a plurality of server sides, the server side of the file processing system is connected with the client side through a gateway, and the mode of uploading the file from the client side to the server side includes: and the gateway polls and uploads the file uploaded by the client to the server according to a preset strategy.
Optionally, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, the method further includes:
the server side scans each file in the distributed file system to monitor the storage duration of each file;
and deleting the files in the distributed file system and deleting the storage position information of the files in the distributed storage system when the storage duration of the files reaches the preset duration.
Optionally, the internet data center further includes a distributed application coordination service, and before the step of scanning, by the server, each file in the distributed file system to monitor the storage duration of each file, the method further includes:
the server side sends request information for deleting the lock to the distributed application program coordination service;
and when the lock is successfully acquired, scanning each file in the distributed file system to monitor the storage time of each file.
Optionally, the server is located in a primary internet data center, and after the step of updating, based on the merged file, the corresponding storage location information in the distributed storage system, when a backup internet data center exists in the system, the method includes:
and the server synchronizes the stored file to the server of the file processing system where the standby internet data center is located so that the server of the file processing system where the standby internet data center is located executes the file storage operation.
In addition, in order to achieve the above object, the present invention further provides a server, where the server includes a memory, a processor, and a file storage program stored in the memory and executable on the processor, and the file storage program, when executed by the processor, implements the steps of the file storage method as described above.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a file storage program which, when executed by a processor, implements the steps of the file storage method as described above.
According to the technical scheme provided by the invention, a server of the file processing system firstly receives the file uploaded by a sender through a client, then caches the received file in a temporary folder, records the storage position information of each file in the distributed storage system, then combines the files in the temporary folder to obtain the combined file, stores the combined file in the distributed file system, and finally updates the corresponding storage position information in the distributed storage system based on the combined file, so that the file can be conveniently read subsequently according to the storage position information. In the scheme, the received files are combined, the combined files are stored in the distributed file system, the amount of the files which can be stored in the system is increased due to the combination of the files, in addition, the distributed file system has expandability, the files are stored through the distributed file system, the amount of the files which can be stored is more, and compared with the existing file storage mode, the file storage method is larger in the scheme and more suitable for storing a large number of small files.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The solution of the embodiment of the invention is mainly as follows: the server side of the file processing system firstly receives the file uploaded by the sender through the client side, then caches the received file into the temporary folder, records the storage position information of each file in the distributed storage system, then combines the files in the temporary folder to obtain the combined file, stores the combined file into the distributed file system, and finally updates the corresponding storage position information in the distributed storage system based on the combined file, so that the file can be conveniently read subsequently according to the storage position information. The problem that storage requirements are difficult to meet in an existing file storage mode is solved.
The existing file storage method also has the following defects:
the file storage scheme has no life cycle management function, does not support functions of temporary file overdue deletion and the like, and easily causes excessive data storage;
the method is not suitable for accessing a large number of systems, and the installation and the deployment are relatively troublesome.
Based on the problems in the prior art, the invention builds an FPS (File processing System), the FPS can support the storage of mass data, and meanwhile, the high availability of the service is ensured by adopting a scheme of storing multiple copies of data across a machine frame and a machine room. The main application scenarios include:
(1) providing an intermediate platform for file exchange among different systems, for example, providing a reconciliation file for a system B through the intermediate platform by the system A for reconciliation;
(2) the data storage platform based on the file life cycle management is provided, mass data storage can be supported, and files are stored for a period of time and need to be automatically deleted when due.
Description of the invention with terms of art:
hadoop: the distributed computing platform is a distributed system infrastructure, and can enable a user to construct and use the distributed computing platform, and the user can develop and run an application program for processing mass data on Hadoop.
HDFS (Hadoop distributed File System): distributed File System (Hadoop Distributed File System). HDFS is characterized by high fault tolerance and is designed for deployment on inexpensive (low-cost) hardware; and it provides high throughput (high throughput) to access data of applications, suitable for applications with very large data sets.
HBase: the distributed storage system is high in reliability, high in performance, nematic and telescopic, and a large-scale structured storage cluster can be built on a low-cost PC Server by utilizing the HBase technology. Belonging to Hadoop ecosphere.
Zookeeper: the distributed application program coordination service is distributed and open source code, is an open source implementation of Chubby of Google, and is an important component of Hadoop and Hbase. It is a software that provides a consistent service for distributed applications, and the functions provided include: configuration maintenance, domain name service, distributed synchronization, group service, etc. Belonging to Hadoop ecosphere.
TGW: the complete Tencent GateWay is a system for realizing multi-network unified access, forwarding of external network requests and supporting automatic load balancing, and the TGW can be called as a GateWay.
NAS: a Network Attached Storage (Network Attached Storage) is a device connected to a Network and having a data Storage function, and is also called a "Network Storage". It is a dedicated data storage server.
RMB: and the message bus system is used for RPC message service among the multiple systems.
As shown in fig. 1, fig. 1 is a schematic diagram of a server-side structure of a hardware operating environment according to an embodiment of the present invention.
The server in the embodiment of the present invention may be a Personal Computer (PC), a tablet computer, a portable computer, or other terminal equipment with a display function.
As shown in fig. 1, the server may include: aprocessor 1001, such as a CPU, acommunication bus 1002, auser interface 1003, anetwork interface 1004, and amemory 1005. Wherein acommunication bus 1002 is used to enable connective communication between these components. Theuser interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and theoptional user interface 1003 may also include a standard wired interface (e.g., for connecting a wired Keyboard, a wired mouse, etc.), a wireless interface (e.g., for connecting a wireless Keyboard, a wireless mouse). Thenetwork interface 1004 may optionally include a standard wired interface (for connecting to a wired network), a wireless interface (e.g., a WI-FI interface, a bluetooth interface, an infrared interface, etc., for connecting to a wireless network). Thememory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). Thememory 1005 may alternatively be a storage device separate from theprocessor 1001.
Optionally, the server may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like.
Those skilled in the art will appreciate that the server side architecture shown in FIG. 1 does not constitute a limitation of the server side, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, amemory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a file storage program. The operating system is a program for managing and controlling the server and software resources, and supports the operation of a network communication module, a user interface module, a file storage program and other programs or software; the network communication module is used for managing and controlling thenetwork interface 1002; the user interface module is used to manage and control theuser interface 1003.
In the server shown in fig. 1, thenetwork interface 1004 is mainly used for connecting to a server of a standby internet data center and performing data communication with the server of the standby internet data center; theuser interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; the server side calls a file storage program stored in thememory 1005 through theprocessor 1001, and executes the following steps:
receiving a file uploaded by a sender through a client;
caching the received files into a temporary folder, and recording storage position information of each file in a distributed storage system;
merging the files in the temporary folder to obtain merged files, and storing the merged files in a distributed file system;
and updating the corresponding storage position information in the distributed storage system based on the merged file.
Further, the server calls a file storage program stored in thememory 1005 through theprocessor 1001 to implement merging processing on each file in the temporary folder, so as to obtain a merged file:
scanning each file in the temporary folder;
and acquiring a combined file, determining a file with a capacity value smaller than a preset threshold value after being combined with the combined file in the scanned files, and combining the determined file into the combined file.
Further, the server calls a file storage program stored in thememory 1005 through theprocessor 1001 to implement the following steps:
when a file query instruction is received, determining index information corresponding to the file query instruction;
searching a merged file pointed by the index information in a distributed file system;
and restoring the merged file to restore the file corresponding to the index information from the merged file.
Further, after the step of storing the merged file in the distributed file system, the server calls, through theprocessor 1001, a file storage program stored in thememory 1005, so as to implement the following steps:
generating file identification information and file hash information based on files stored in a distributed file system;
feeding back file identification information and file hash information to the sender through the client, so that the sender can transmit the file identification information and the file hash information to a receiver;
and when the client receives the file identification information sent by the receiver, extracting the file corresponding to the file identification information from the distributed file system, and feeding the file back to the receiver so that the receiver can check the file through the file hash information, and acquiring the file when the check is successful.
Further, the number of the server sides includes a plurality of, the server side of the file processing system is connected with the client side through a gateway, and the mode of uploading the file from the client side to the server side includes: and the gateway polls and uploads the file uploaded by the client to the server according to a preset strategy.
Further, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, the server calls, through theprocessor 1001, the file storage program stored in thememory 1005, so as to implement the following steps:
scanning each file in the distributed file system to monitor the storage duration of each file;
and deleting the files in the distributed file system and deleting the storage position information of the files in the distributed storage system when the storage duration of the files reaches the preset duration.
Further, the internet data center further includes a distributed application program coordination service, and before the step of scanning each file in the distributed file system by the server to monitor the storage duration of each file, the server calls a file storage program stored in thememory 1005 through theprocessor 1001 to implement the following steps:
sending request information for deleting the lock to the distributed application program coordination service;
and when the lock is successfully acquired, scanning each file in the distributed file system to monitor the storage time of each file.
Further, the server is located in the primary internet data center, and after the step of updating the corresponding storage location information in the distributed storage system based on the merged file in the case that the backup internet data center exists in the system, the server calls the file storage program stored in thememory 1005 through theprocessor 1001 to implement the following steps:
and synchronizing the stored files to a server of a file processing system where the standby internet data center is located so that the server of the file processing system where the standby internet data center is located executes file storage operation.
Based on the hardware structure of the server, the file storage method provided by the invention has various embodiments.
Referring to fig. 2, fig. 2 is a flowchart illustrating a file storage method according to a first embodiment of the present invention.
In this embodiment, the file storage method is applied to an internet data center, where the internet data center includes a file processing system, a distributed file system, and a distributed storage system, the file processing system includes a server and a client, and the file storage method includes:
a server of the file processing system receives a file uploaded by a sender through a client; caching the received files into a temporary folder, and recording storage position information of each file in a distributed storage system; merging the files in the temporary folder to obtain merged files, and storing the merged files in a distributed file system; and updating the corresponding storage position information in the distributed storage system based on the merged file.
In this embodiment, the file storage method is applied to a server corresponding to a file processing system FPS where an IDC (Internet Data Center) is located, and the server may be the server shown in fig. 2.
It should be noted that, in the embodiment of the present invention, an FPS (File processing System) is provided, and the FPS is an independently developed small File storage processing System, and has functions of specific life cycle management, cross-machine room disaster recovery, and the like.
In the implementation of the present invention, the structure diagram of the IDC may refer to fig. 3:
the IDC comprises a file processing system FPS, a distributed file system HDFS and a distributed storage system Hbase, wherein the FPS mainly comprises two parts: FPS-Client and FPS-Server. The business program uses the functions provided by the FPS by integrating the FPS-Client.
FPS-Client: and the FPS provides clients with java version, c language version and python version externally.
FPS-Server: the background server program of the FPS provides main service logic processing, including functions of authority control, file storage and reading, file life cycle management and the like.
As can be seen from fig. 3, the server and the client of the file processing system are connected through a gateway, the gateway is represented by a TGW, and the TGW (tencent gateway) is used for load balancing when forwarding files.
It should be noted that, the number of the servers and the clients of the FPS is not limited, and the number may be set according to the actual situation, and when the number of the servers includes a plurality of servers, the manner of uploading the file from the client to the server includes: and the gateway polls and uploads the file uploaded by the client to the server according to a preset strategy.
That is, all traffic that is uploaded to the FPS-Server by the gateway, that is, the FPS-Client, passes through the TGW, and the file polling is sent to each FPS-Server through the TGW. After receiving the file, the FPS-Server stores the content of the file in an HDFS (distributed file system) and records the storage position information of the file in an HBASE (distributed storage system).
It should be noted that, the HDFS is a distributed file system, and a PC is used as storage, and a general file has multiple backups, such as three backups, so that a machine can be dynamically and linearly added conveniently, and in the face of exponential growth of internet services, the non-stop capacity expansion can be conveniently achieved, and the service requirements can be well met. Meanwhile, the storage position information of the file is stored in HBase, so that the billions of file indexes can be stored, and the HBase is the same as HDFS and belongs to the members of the Hadoop ecosphere, and the storage performance can be conveniently improved by adding nodes.
In the embodiment of the invention, the interaction between the FPS-Client and the FPS-Server is carried out by an HTTP (Hyper Text Transfer Protocol).
The following are specific steps of implementing file storage in this embodiment:
step S10, the server of the file processing system receives the file uploaded by the sender through the client;
namely, the server receives the file uploaded by the client through the TGW.
Step S20, caching the received files into a temporary folder, and recording the storage position information of each file in a distributed storage system;
in this embodiment, as shown in fig. 3, the FPS-Server receives a file sent by a sender through the FPS-Client, after the FPS-Server receives the file, the received file is firstly cached in a temporary folder, where the temporary folder is preferably an assigned directory of the HDFS, and after the FPS-Server caches the file in the temporary folder, the FPS-Server records storage location information of each file in the distributed storage file.
Step S30, merging each file in the temporary folder to obtain a merged file, and storing the merged file in the distributed file system;
specifically, the step of "merging each file in the temporary folder to obtain a merged file" includes:
step 1, the server scans the directory in the temporary folder to obtain the lock corresponding to the directory;
step 2, under the condition of acquiring the lock, the server scans each file in the temporary folder;
and step 3, acquiring the combined file, determining the file which is combined with the combined file and has the capacity value smaller than a preset threshold value from the scanned files, and combining the determined file into the combined file.
Further, after step 3, the method further comprises:
and 4, deleting the combined files in the temporary folder.
That is, after the uploaded file is cached in the temporary folder, the FPS-Server scans the directory in the temporary folder to obtain a lock corresponding to the directory, and if the lock can be obtained, the FPS-Server scans the capacity value of each file in the temporary folder, and preferably performs merging processing on the file whose capacity value merged with the combined file is smaller than a preset threshold, in this embodiment, the preset threshold is set according to an actual situation, and this is not limited here. The merging mode of the files is as follows: acquiring a preset combined file, scanning each file in the temporary folder, determining files with the capacity value smaller than a preset threshold value after being combined with the combined file in the scanned files, combining the determined files into the combined file, storing the combined file into a distributed file system (HDFS) after the files are combined, and updating index information corresponding to the combined files in the distributed storage system according to the combined files.
In the embodiment of the present invention, each file in the temporary folder may be periodically scanned, or each file in the folder may be scanned in real time, then the scanned small files are merged, and the storage location information after merging of the small files is updated in the HBase. The small files are merged by the FPS-Server, so that the files subsequently stored in the distributed file system are not scattered, the space occupied by the stored files can be reduced, and the memory space of cluster nodes can be obviously saved.
For better understanding of the embodiment, referring to fig. 4, first, the FPS-Server scans the temporary directory of the HDFS, then obtains a lock corresponding to the directory, scans each file under the directory in the case of obtaining the lock, then obtains index information of the small files in the distributed storage system, then obtains a combined file, merges the small files into the combined file, and finally updates the index information and deletes the original small files in the distributed file systems.
Step S40, based on the merged file, updating the corresponding storage location information in the distributed storage system.
After the merged file is obtained, the FPS-Server updates the distributed storage system based on the merged file to update the storage location information corresponding to the merged file in the distributed storage system, that is, the FPS-Server updates the storage location information of the merged file in the distributed storage system to the distributed storage system, so that when the file is subsequently searched, the FPS-Server indexes the corresponding information according to the storage location information.
In addition, in the embodiment of the present invention, the method further includes:
step A, when a file query instruction is received, determining index information corresponding to the file query instruction;
b, searching a merged file pointed by the index information in a distributed file system;
and C, restoring the merged file to restore the file corresponding to the index information from the merged file.
According to the technical scheme provided by the embodiment, a server of the file processing system receives a file uploaded by a sender through a client, the received file is cached in a temporary folder, the storage location information of each file is recorded in the distributed storage system, then each file in the temporary folder is merged to obtain a merged file, the merged file is stored in the distributed file system, and finally the corresponding storage location information in the distributed storage system is updated based on the merged file, so that the file can be conveniently read subsequently according to the storage location information. In the scheme, the received files are combined, the combined files are stored in the distributed file system, the amount of the files which can be stored in the system is increased due to the combination of the files, in addition, the distributed file system has expandability, the files are stored through the distributed file system, the amount of the files which can be stored is more, and compared with the existing file storage mode, the file storage method is larger in the scheme and more suitable for storing a large number of small files.
Further, referring to fig. 5, a second embodiment of the file storing method of the present invention is proposed based on the first embodiment.
The second embodiment of the file storage method differs from the first embodiment of the file storage method in that, after the step S30, the method further includes:
step S50, the server generates file identification information and file hash information based on the files stored in the distributed file system;
step S60, feeding back file identification information and file hash information to the sender through the client, so that the sender can transmit the file identification information and the file hash information to a receiver;
step S70, when receiving the file identification information sent by the receiver through the client, extracting the file corresponding to the file identification information in the distributed file system, and feeding back the file to the receiver, so that the receiver can check the file through the file hash information, and obtain the file when the check is successful.
In this embodiment, after the FPS-Server stores a File in the HDFS, the FPS-Server generates File identification information (File Id) and File Hash information (File Hash) corresponding to the File according to the File stored in the HDFS, and after obtaining the File Id and the File Hash, the FPS-Server feeds back the File Id and the File Hash to the sender, so that the sender transmits the File Id and the File Hash to the receiver.
It should be noted that the sender and the receiver interact with each other through the RMB message service bus. And after receiving the File Id and the File Hash, the receiving party sends the File Id to the FPS-Server by using the FPS-Client.
When the FPS-Server receives the File Id sent by the receiver through the FPS-Client, the File corresponding to the File Id is extracted from the HDFS and fed back to the receiver, so that the receiver can check the File through the File Hash and acquire the File when the check is successful. That is, the receiver downloads the File from the FPS-Server through the File Id, checks the accuracy of the File through the File Hash, and completes the downloading and the File accuracy check in the FPS-Client.
For better understanding of the embodiment, referring to fig. 6, the sender uploads the File to the FPS, and after the File is successfully uploaded, the FPS returns the File Id and the File Hash of the File to the sender, and after the sender receives the File Id and the File Hash, the sender sends the File Id and the File Hash to the receiver through the RMB message service bus.
In the embodiment, file transmission between the sender and the receiver is realized through each system in the internet data center, and the file identification information and the file hash information are used for verifying the file, so that the accuracy of file transmission is improved.
Further, referring to fig. 7, a third embodiment of the file storing method of the present invention is proposed based on the first embodiment.
The third embodiment of the file storage method differs from the first embodiment of the file storage method in that, after the step S40, the method includes:
step S80, the server scans each file in the distributed file system to monitor the storage duration of each file;
and step S90, deleting the files in the distributed file system and deleting the storage position information of the files in the distributed storage system when the storage duration of the files reaches a preset duration.
In this embodiment, the manner in which the server scans each file in the distributed file system is preferably a timed scan. Therefore, after the distributed file system HDFS stores files, the FPS-Server background periodically performs the following operations: files that have expired are periodically scanned and deleted to save disk space. Specifically, the method comprises the following steps: the FPS-Server scans each file in the HDFS to monitor the storage time of each file, and when the storage time of a file reaches a preset time, the preset time is set according to the actual condition without limitation, and if the preset time is 3 months. When the storage duration of the file reaches the preset duration, the storage duration of the file is longer, and in order to realize the life cycle management of file storage, the file in the distributed file system is deleted, and the storage position information of the file in the distributed storage system is deleted.
In this embodiment, the internet data center further includes a distributed application coordination service, and before the step S80, the method further includes:
the server side sends request information for deleting the lock to the distributed application program coordination service;
and when the lock is successfully acquired, scanning each file in the distributed file system to monitor the storage time of each file.
As shown in FIG. 3, the application coordination service is represented by Zookeeper. Before deleting the file, the FPS-Server sends request information for acquiring a delete lock to the Zookeeper, and if the lock can be successfully acquired, the step S80 is executed.
For better understanding of the present invention, referring to fig. 8, the FPS-Server sends request information for acquiring a delete lock to the Zookeeper at regular time, if the lock can be acquired, that is, the FPS-Server successfully acquires the lock, at this time, the FPS-Server requests outdated data, that is, storage location information corresponding to a file whose storage duration from the hbsase reaches a preset duration, to the distributed storage system Hbase, and deletes the requested storage location information after the request. Subsequently, the FPS-Server requests the Zookeeper to acquire a deletion lock, and deletes the file with the storage duration reaching the preset duration in the HDFS after the lock is acquired.
In the embodiment, due to the fact that the expired data is deleted at regular time, the file storage has a life cycle, expired files can be deleted at regular time, the file volume is prevented from being too large, and the intelligence of the file storage is improved.
Further, a fourth embodiment of the file storing method of the present invention is proposed based on the first embodiment.
The fourth embodiment of the file storage method is different from the first to third embodiments of the file storage method in that the server is located in a primary internet data center, and in a case where a backup internet data center exists in the system, after the step S40, the method includes:
and D, synchronizing the stored files to a server of the file processing system where the standby internet data center is located by the server so that the server of the file processing system where the standby internet data center is located can execute file storage operation.
In this embodiment, the deployed internet data center IDC includes multiple sets, for example, two sets of internet data center IDCs, namely a main IDC and a passive IDC, and the two IDCs are in network communication with each other. The business system is connected to the TGW request uploading file through the integrated FPS-Client, the TGW routes the request to a certain FPS-Server in the main IDC according to a specified strategy, and then the file uploading is formally started. The FPS-Server temporarily stores the file in the file, and the position information of the file is stored in HBASE.
When the file in the main IDC is successfully uploaded, the FPS-Server in the main IDC asynchronously uploads the file to the FPS-Server in the standby IDC, so that the consistency of the files in the two clusters is ensured, and the main and standby FPS synchronously adopt logic backup.
In the embodiment, through the backup of the files, the backup IDCs can continue to provide services under the condition that the main IDCs have faults, the storage and the use of the files are not influenced, and the availability is higher.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a file storage program is stored on the computer-readable storage medium, and when executed by a processor, the file storage program implements the following operations:
receiving a file uploaded by a sender through a client;
caching the received files into a temporary folder, and recording storage position information of each file in a distributed storage system;
merging the files in the temporary folder to obtain merged files, and storing the merged files in a distributed file system;
and updating the corresponding storage position information in the distributed storage system based on the merged file.
Further, when the file storage program is executed by the processor, the operation of merging the files in the temporary folder to obtain a merged file is also realized:
scanning each file in the temporary folder;
and acquiring a combined file, determining a file with a capacity value smaller than a preset threshold value after being combined with the combined file in the scanned files, and combining the determined file into the combined file.
Further, when executed by the processor, the file storage program further implements the following operations:
when a file query instruction is received, determining index information corresponding to the file query instruction;
searching a merged file pointed by the index information in a distributed file system;
and restoring the merged file to restore the file corresponding to the index information from the merged file.
Further, after the step of storing the merged file in the distributed file system, when the file storage program is executed by the processor, the following operations are also implemented:
generating file identification information and file hash information based on files stored in a distributed file system;
feeding back file identification information and file hash information to the sender through the client, so that the sender can transmit the file identification information and the file hash information to a receiver;
and when the client receives the file identification information sent by the receiver, extracting the file corresponding to the file identification information from the distributed file system, and feeding the file back to the receiver so that the receiver can check the file through the file hash information, and acquiring the file when the check is successful.
Further, the number of the server sides includes a plurality of, the server side of the file processing system is connected with the client side through a gateway, and the mode of uploading the file from the client side to the server side includes: and the gateway polls and uploads the file uploaded by the client to the server according to a preset strategy.
Further, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, when the file storage program is executed by the processor, the following operations are further implemented:
scanning each file in the distributed file system to monitor the storage duration of each file;
and deleting the files in the distributed file system and deleting the storage position information of the files in the distributed storage system when the storage duration of the files reaches the preset duration.
Further, the internet data center further includes a distributed application program coordination service, and before the step of scanning each file in the distributed file system by the server to monitor the storage duration of each file, when the file storage program is executed by the processor, the following operations are also implemented:
sending request information for deleting the lock to the distributed application program coordination service;
and when the lock is successfully acquired, scanning each file in the distributed file system to monitor the storage time of each file.
Further, the server is located in a primary internet data center, and when the backup internet data center exists in the system, after the step of updating the corresponding storage location information in the distributed storage system based on the merged file, the file storage program is executed by the processor, and the following operations are further implemented:
and synchronizing the stored files to a server of a file processing system where the standby internet data center is located so that the server of the file processing system where the standby internet data center is located executes file storage operation.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.