CN102571959B

Movatterモバイル変換

Info

Publication number: CN102571959B
Application number: CN201210007673.0A
Authority: CN
Inventors: 王力; 王�锋
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2012-01-11
Filing date: 2012-01-11
Publication date: 2015-05-06
Anticipated expiration: 2032-01-11
Also published as: CN102571959A

Abstract

The invention discloses a data downloading system and a method, wherein the system comprises a plurality of downloading clusters, each downloading cluster comprises a Linux virtual server LVS, at least two downloading nodes Nginx and a distributed file system HDFS, and an operating system of the Nginx is used for mounting the storage service of the HDFS through a user space file system FUSE; wherein: the LVS is used for receiving a downloading request of a user, scheduling and selecting each Nginx, and forwarding the downloading request to the selected Nginx; the Nginx is used for accessing the data stored in the HDFS through the FUSE and responding to the download request of the user after receiving the download request forwarded by the LVS; the HDFS is used to store data. The invention can meet the downloading service requirements when the storage data volume is large, the file quantity is large and some service users access hot spots are not centralized.

Description

Translated fromChinese

一种数据下载系统及方法A data download system and method

技术领域technical field

本发明涉及计算机技术领域，特别是涉及一种数据下载系统及方法。The invention relates to the field of computer technology, in particular to a data downloading system and method.

背景技术Background technique

随着信息技术的发展，人们日益习惯于通过网络来获取各种数据。例如，一种通常的方式就是通过数据下载系统来下载所需的内容。With the development of information technology, people are increasingly accustomed to obtaining various data through the network. For example, a common way is to download desired content through a data download system.

目前使用较多的下载架构主要有两种，一种是传统的CDN(ContentDelivery Network，内容分发网络)多层缓存架构，另一种是一般的视频网站使用较多的下载架构。At present, there are two main download architectures that are widely used. One is the traditional CDN (ContentDelivery Network, Content Delivery Network) multi-layer cache architecture, and the other is the download architecture that is commonly used by general video websites.

CDN多层缓存架构的示意图如图1所示，在这种架构中，全局数据仅存储在顶层源站设备中，每个边缘服务节点由Nginx做反向代理，将请求转给后端的Squid，Squid接到用户请求后回源站抓取数据提供服务，前端使用DNS方式进行负载均衡。这种架构通常适用于整体数据量不大，且热点集中的下载业务。A schematic diagram of the CDN multi-layer cache architecture is shown in Figure 1. In this architecture, global data is only stored in the top-level source site device, and each edge service node is reverse-proxyed by Nginx, which forwards the request to the back-end Squid. After receiving user requests, Squid returns to the source site to grab data and provide services, and the front end uses DNS to perform load balancing. This architecture is usually suitable for download services with a small overall data volume and concentrated hotspots.

视频网站使用较多的下载架构如图2所示，在这种架构中，全局建立若干分布式存储集群，各集群间数据没有冗余或冗余较低，使用基于HTTP(HyperText Transfer Protocol，超文本传输协议)的第七层负载均衡设备对全局的Nginx服务器进行负载均衡。这种架构通常适用于视频下载业务，整体并发度较低，多为连续读取。The download architecture used by video websites is shown in Figure 2. In this architecture, several distributed storage clusters are established globally. Text Transfer Protocol) layer-7 load balancing device load balances the global Nginx server. This architecture is usually suitable for video download business, the overall concurrency is low, and most of them are continuous reading.

但是现在一些场合(例如大型软件、游戏等安装程序下载等)中的下载服务通常有如下特点：存储数据量大、文件数量大，且某部分业务用户访问热点不集中等等。因此，迫切需要本领域技术人员解决的技术问题就在于，如何提供一种新的下载架构，能够满足上述要求。However, the download service in some occasions (such as large-scale software, game download, etc.) usually has the following characteristics: a large amount of stored data, a large number of files, and some business users' access hotspots are not concentrated. Therefore, a technical problem that urgently needs to be solved by those skilled in the art is how to provide a new download architecture that can meet the above requirements.

发明内容Contents of the invention

本发明提供了一种数据下载系统及方法，能够满足存储数据量大、文件数量大，且某部分业务用户访问热点不集中时的下载服务需求。The invention provides a data download system and method, which can meet the download service requirements when the amount of stored data is large, the number of files is large, and some business users access hotspots are not concentrated.

本发明提供了如下方案：The present invention provides following scheme:

一种数据下载系统，包括多个下载集群，每个下载集群中包括Linux虚拟服务器LVS、至少两个下载节点Nginx以及一个分布式文件系统HDFS，所述Nginx的操作系统通过用户空间文件系统FUSE挂载HDFS的存储服务；其中：A kind of data downloading system, comprises a plurality of downloading clusters, comprises Linux virtual server LVS, at least two downloading nodes Nginx and a distributed file system HDFS in each downloading cluster, the operating system of described Nginx is linked to by user space file system FUSE HDFS storage service; where:

所述LVS用于接收用户的下载请求，对各个Nginx进行调度选择，将所述下载请求转发给选中的Nginx；Described LVS is used for receiving the user's download request, carries out scheduling selection to each Nginx, and described download request is forwarded to selected Nginx;

所述Nginx用于在接收到LVS转发的下载请求后，通过FUSE访问HDFS中存储的数据，响应用户的下载请求；Said Nginx is used to access the data stored in HDFS through FUSE after receiving the download request forwarded by LVS, and respond to the user's download request;

所述HDFS用于存储数据。The HDFS is used to store data.

其中，所述LVS具体用于：接收到用户的下载请求后，根据各个Nginx的性能和/或当前负载状况对各个Nginx进行调度选择，将所述下载请求发送给性能和/或当前负载状态符合预置条件的Nginx。Wherein, the LVS is specifically used for: after receiving the user's download request, according to the performance and/or current load status of each Nginx, each Nginx is scheduled and selected, and the download request is sent to the Nginx that meets the performance and/or current load status Nginx preconditions.

其中，所述LVS为两个。Wherein, the LVS is two.

其中，两个LVS互为主备，每个LVS通过一个处于主模式的虚IP向用户提供下载服务，同时存在一个处于备用模式的虚IP；当一个LVS无法提供服务时，另一个LVS通过启动所述处于备用模式的虚IP来接管该LVS的下载服务。Among them, the two LVS are active and standby each other, and each LVS provides download services to users through a virtual IP in the main mode, and there is a virtual IP in the standby mode; when one LVS cannot provide services, the other LVS will start The virtual IP in standby mode takes over the download service of the LVS.

其中，所述HDFS包括名字节点及至少两个数据节点，所述名字节点与LVS复用一台服务器，每个数据节点与一个Nginx复用一台服务器。Wherein, the HDFS includes a name node and at least two data nodes, the name node and LVS multiplex a server, and each data node and an Nginx multiplex a server.

其中，Nginx的数目与数据节点的数目相同。Among them, the number of Nginx is the same as the number of data nodes.

一种数据下载方法，应用于一数据下载系统中，所述数据下载系统包括多个下载集群，每个下载集群中包括Linux虚拟服务器LVS、至少两个下载节点Nginx以及一个分布式文件系统HDFS，所述HDFS用于存储数据，所述Nginx的操作系统通过用户空间文件系统FUSE挂载HDFS的存储服务；所述方法包括：A data download method, applied in a data download system, the data download system includes a plurality of download clusters, each download cluster includes a Linux virtual server LVS, at least two download nodes Nginx and a distributed file system HDFS, Described HDFS is used for storing data, and the operating system of described Nginx mounts the storage service of HDFS by user space file system FUSE; Described method comprises:

通过所述LVS接收用户的下载请求，并由所述LVS对各个Nginx进行调度选择，将所述下载请求转发给选中的Nginx；Receive user's download request by described LVS, and carry out scheduling selection to each Nginx by described LVS, described download request is forwarded to selected Nginx;

Nginx在接收到LVS转发的下载请求后，通过FUSE访问HDFS中存储的数据，响应用户的下载请求。After receiving the download request forwarded by LVS, Nginx accesses the data stored in HDFS through FUSE and responds to the user's download request.

其中，所述对各个Nginx进行调度选择，将所述下载请求转发给选中的Nginx包括：Wherein, the described scheduling selection to each Nginx, forwarding the download request to the selected Nginx includes:

根据各个Nginx的性能和/或当前负载状况对各个Nginx进行调度选择，将所述下载请求发送给性能和/或当前负载状态符合预置条件的Nginx。Each Nginx is scheduled and selected according to the performance and/or current load status of each Nginx, and the download request is sent to the Nginx whose performance and/or current load status meet the preset conditions.

其中，所述LVS为两个。Wherein, the LVS is two.

其中，还包括：Among them, also include:

对Nginx的参数进行调整，所调整的参数包括Sendfile选项、worker进程数、单进程最大连接数、backlog参数、output-buffers中的一个或多个。Adjust the parameters of Nginx. The adjusted parameters include one or more of the Sendfile option, the number of worker processes, the maximum number of connections for a single process, the backlog parameter, and output-buffers.

其中，还包括：Among them, also include:

对HDFS参数进行调整，所调整的参数包括read函数的参数数目、IPCserver listen队列长度、IPC Server工作线程数、数据传输最大线程数中的一个或多个。The HDFS parameters are adjusted, and the adjusted parameters include one or more of the number of parameters of the read function, the length of the IPC server listen queue, the number of IPC Server working threads, and the maximum number of threads for data transmission.

根据本发明提供的具体实施例，本发明公开了以下技术效果：According to the specific embodiments provided by the invention, the invention discloses the following technical effects:

第一，由于在下载系统中应用了HDFS，因此，使得单个下载集群能够提供百TB级别存储，可通过简单的插拔方式增加存储服务器，提高存储能力，满足大存储数据量的需求；同时，由于不使用硬件RAID卡，而是通过HDFS服务对各磁盘进行并行读取，因此，可以实现单机各磁盘独立读取，能够很好地满足用户访问无热点、文件读取分散的高I/O需求，并提高数据吞吐量。实验数据显示，在使用12块7200转SATA盘的情况下，单机的总IOPS可达1000以上；最高可实现单机15000以上的并发连接，1Gb以上的带宽吞吐。First, due to the application of HDFS in the download system, a single download cluster can provide hundreds of terabytes of storage, and storage servers can be added through simple plug-in methods to improve storage capacity and meet the needs of large storage data volumes; at the same time, Since the hardware RAID card is not used, each disk is read in parallel through the HDFS service. Therefore, each disk on a single machine can be read independently, which can well meet the high I/O of user access without hot spots and scattered file reading. demand and increase data throughput. Experimental data shows that when using 12 7200 rpm SATA disks, the total IOPS of a single machine can reach more than 1000; the maximum concurrent connection of a single machine can reach more than 15000, and the bandwidth throughput of more than 1Gb can be achieved.

第二，由于每个集群仅将LVS暴露给用户，因此，使得每个集群具有高内聚的特点，也即，如果某个或某些Nginx由于发生故障等无法向外提供服务，可以通过LVS在集群内部予以解决(例如，可以对后端Nginx进行可用性监控，对异常Nginx进行秒级摘除等)，不会影响到其他的集群。Second, because each cluster only exposes LVS to users, each cluster has the characteristics of high cohesion, that is, if one or some Nginx cannot provide external services due to failure, etc., it can It is resolved within the cluster (for example, the availability monitoring of the back-end Nginx can be performed, and abnormal Nginx can be removed in seconds, etc.), without affecting other clusters.

第三，可以直接由LVS在集群内部进行负载均衡。Third, LVS can directly perform load balancing within the cluster.

第四，由于每个集群内容都对数据进行了存储，因此，可以实现数据在不同集群间的完全冗余，可以保证能够快速地进行流量调度与切换。Fourth, since the content of each cluster stores data, complete redundancy of data among different clusters can be realized, and fast traffic scheduling and switching can be guaranteed.

第五，采用上述结构，可以很容易的实现对系统的扩展。例如，若发现前端LVS的网络性能成为瓶颈，可增加独立的LVS服务器或在现有服务器上增加网卡，快速实现性能扩展；若数据存储空间不足、或下载服务不能满足需求，可简单通过增加HDFS的数据节点进行存储空间及下载服务性能扩展，相应的，HDFS会自动完成新节点的数据同步。Fifth, with the above-mentioned structure, the expansion of the system can be easily realized. For example, if it is found that the network performance of the front-end LVS becomes a bottleneck, you can add an independent LVS server or add a network card to the existing server to quickly realize performance expansion; if the data storage space is insufficient, or the download service cannot meet the demand, you can simply add HDFS The storage space and download service performance expansion of the existing data nodes, correspondingly, HDFS will automatically complete the data synchronization of the new nodes.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the accompanying drawings required in the embodiments. Obviously, the accompanying drawings in the following description are only some of the present invention. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1是现有技术中的下载架构示意图；FIG. 1 is a schematic diagram of a download architecture in the prior art;

图2是现有技术中的另一下载架构示意图；FIG. 2 is a schematic diagram of another download architecture in the prior art;

图3是本发明实施例提供的系统的示意图；Fig. 3 is a schematic diagram of a system provided by an embodiment of the present invention;

图4是本发明实施例提供的系统中各服务的物理部署示意图；Fig. 4 is a schematic diagram of the physical deployment of each service in the system provided by the embodiment of the present invention;

图5是本发明实施例提供的方法的流程图。Fig. 5 is a flowchart of a method provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments of the present invention belong to the protection scope of the present invention.

本发明实施例首先提供了一种数据下载系统，其示意图如图3所示。在该系统中，包括多个下载集群(图中仅示出一个下载集群)，每个下载集群中都包括LVS(Linux Virtual Server，Linux虚拟服务器)、至少两个下载节点Nginx以及一个HDFS(Hadoop Distributed File System，分布式文件系统)，其中，Nginx所运行的操作系统通过FUSE(Filesystem in Userspace，用户空间文件系统)挂载HDFS的存储服务。An embodiment of the present invention firstly provides a data downloading system, the schematic diagram of which is shown in FIG. 3 . In this system, multiple download clusters are included (only one download cluster is shown in the figure), and each download cluster includes LVS (Linux Virtual Server, Linux Virtual Server), at least two download nodes Nginx and a HDFS (Hadoop Distributed File System, distributed file system), wherein, the operating system run by Nginx mounts the storage service of HDFS through FUSE (Filesystem in Userspace, user space file system).

为了便于理解本发明实施例，下面首先对HDFS、LVS、Nginx以及FUSE进行简单地介绍。In order to facilitate the understanding of the embodiment of the present invention, HDFS, LVS, Nginx and FUSE are briefly introduced below.

HDFS有着高容错性的特点，并且设计用来部署在低廉的硬件上，而且它提供高吞吐量来访问应用程序的数据，适合那些有着超大数据集的应用程序。正是由于HDFS具有上述特点，本发明实施例试图将HDFS应用到在线的下载服务中，并提供相应的解决方案。当然，在现有技术中，HDFS一般用于线下的数据存储，尚未有将HDFS应用于在线的数据下载架构中的应用。HDFS has the characteristics of high fault tolerance and is designed to be deployed on low-cost hardware, and it provides high throughput to access application data, suitable for applications with very large data sets. Just because HDFS has the above characteristics, the embodiment of the present invention attempts to apply HDFS to online download service and provide a corresponding solution. Certainly, in the prior art, HDFS is generally used for offline data storage, and there is no application of applying HDFS to an online data download architecture.

LVS是一个开源的软件，可以实现LINUX平台下的简单负载均衡。LVS采用IP负载均衡技术和基于内容请求分发技术，其调度器具有很好的吞吐率，将请求均衡地转移到不同的服务器上执行，且调度器能够自动屏蔽掉服务器的故障，从而将一组服务器构成一个高性能的、高可用的虚拟服务器。整个服务器集群的结构对客户是透明的，而且无需修改客户端和服务器端的程序。LVS is an open source software that can realize simple load balancing under the LINUX platform. LVS adopts IP load balancing technology and content-based request distribution technology, and its scheduler has a good throughput rate, which can transfer requests to different servers in a balanced manner, and the scheduler can automatically shield server failures, so that a group of The server constitutes a high-performance, high-availability virtual server. The structure of the entire server cluster is transparent to the client, and there is no need to modify the programs of the client and server.

Nginx是一个高性能的HTTP和反向代理服务器，在高连接并发的情况下，Nginx是一个不错的选择。Nginx is a high-performance HTTP and reverse proxy server. In the case of high concurrent connections, Nginx is a good choice.

FUSE是Linux操作系统中的概念，指完全在用户态实现的文件系统，用于挂载某些网络空间，到本地文件系统的模块。传统上操作系统在内核层面上对文件系统提供支持，而通常内核态的代码难以调试，生产率较低；但在利用了FUSE之后，能够大幅提高生产率，简化为操作系统提供新的文件系统的工作量，特别适用于各种虚拟文件系统和网络文件系统。FUSE is a concept in the Linux operating system, which refers to a file system that is completely implemented in user mode, and is used to mount certain network spaces to a module of the local file system. Traditionally, the operating system provides support for the file system at the kernel level, and usually the code in the kernel state is difficult to debug and the productivity is low; however, after using FUSE, the productivity can be greatly improved and the work of providing a new file system for the operating system can be simplified It is especially suitable for various virtual file systems and network file systems.

基于上述特点，本发明实施例提供了一种新型的数据下载系统。其中，在每个集群内部，数据通过HDFS进行存储，集群最外层暴露LVS虚IP向用户提供服务，也即，由LVS接收用户的下载请求；并且LVS在接收到用户的下载请求之后，会对各个Nginx进行调度，选择合适的Nginx，并将用户的下载请求转发给该选中的Nginx。Nginx在接收到用户的下载请求之后，就可以通过FUSE访问HDFS中存储的数据，以此来响应用户的下载请求。其中，LVS具体在对Nginx进行调度时，可以依据各个Nginx的性能(包括服务器的硬件性能指标等)和/或当前负载状况进行，选择出性能和/或当前负载状态符合预置条件的Nginx，并将用户的下载请求发送给该Nginx即可。Based on the above features, the embodiment of the present invention provides a novel data downloading system. Among them, within each cluster, data is stored through HDFS, and the outermost layer of the cluster exposes the LVS virtual IP to provide services to users, that is, LVS receives the user's download request; and after receiving the user's download request, LVS will Schedule each Nginx, select the appropriate Nginx, and forward the user's download request to the selected Nginx. After receiving the user's download request, Nginx can access the data stored in HDFS through FUSE to respond to the user's download request. Wherein, when LVS specifically schedules Nginx, it can be carried out according to the performance of each Nginx (including the hardware performance index of the server, etc.) and/or the current load status, and selects the Nginx whose performance and/or current load status meet the preset conditions, And send the user's download request to the Nginx.

通过上述系统架构，可以使得本发明实施例提供的数据下载系统具有以下特点：Through the above system architecture, the data download system provided by the embodiment of the present invention can have the following characteristics:

由于可以对系统进行扩展，因此，在实际应用中，LVS可以是多个。例如，在图3所示的示意图中，LVS可以为两个，同时向用户提供下载服务。此时，这两个LVS可以互为主备，每个LVS通过一个处于主模式的虚IP向用户提供下载服务，同时存在一个处于备用模式的虚IP；当一个LVS无法提供服务时，另一个LVS通过启动所述处于备用模式的虚IP来接管该LVS的下载服务。通过这种机制，进一步保证了单个集群内部的高可靠性，避免由于LVS出现故障导致下载服务的失败。Since the system can be expanded, there can be multiple LVSs in practical applications. For example, in the schematic diagram shown in FIG. 3 , there may be two LVSs, and download services are provided to users at the same time. At this time, the two LVSs can be mutually active and standby, and each LVS provides download services to users through a virtual IP in the main mode, and there is a virtual IP in the standby mode at the same time; when one LVS cannot provide services, the other The LVS takes over the download service of the LVS by starting the virtual IP in standby mode. Through this mechanism, the high reliability within a single cluster is further guaranteed, and the failure of the download service due to the failure of the LVS is avoided.

此外，HDFS还具有以下特点：HDFS是一个主从结构的体系，一个HDFS集群是由名字节点(NameNode，NN)和数据节点(DataNode，DN)组成，名字节点是一个管理文件的命名空间和调节客户端访问文件的主服务器，数据节点用来管理存储。换言之，名字节点操作文件命名空间的文件或目录操作，如打开，关闭，重命名，等等，它同时确定数据与数据节点的映射。数据节点来负责来自文件系统用户的读写请求，同时还要执行数据的创建，删除，和来自名字节点的数据复制指示。其中，名字节点和数据节点都是软件运行在普通的机器之上。In addition, HDFS also has the following characteristics: HDFS is a master-slave structure system. An HDFS cluster is composed of a name node (NameNode, NN) and a data node (DataNode, DN). Clients access the main server for files, and data nodes are used to manage storage. In other words, the name node manipulates the file or directory operations of the file namespace, such as opening, closing, renaming, etc., and it also determines the mapping of data to data nodes. Data nodes are responsible for reading and writing requests from file system users, and also perform data creation, deletion, and data replication instructions from name nodes. Among them, the name node and the data node are both software running on ordinary machines.

在实际应用中，为了实现硬件资源的充分利用，降低系统规模，提高系统的维护性，各个服务在物理上可以通过以下方式进行部署：In practical applications, in order to achieve full utilization of hardware resources, reduce system scale, and improve system maintainability, each service can be physically deployed in the following ways:

参见图4，HDFS的名字节点(NN)服务与LVS服务进行服务器的复用，例如，图4中，当存在两个LVS时，系统中也可以同时存在两个名字节点。也即，在名字节点上部署数据同步客户端，并部署FUSE挂载HDFS，通过FUSE向HDFS更新数据。在下载服务过程中，名字节点仅响应集群内部少量的RPC请求，负载很低；LVS服务属于CPU与网络密集型服务，但效率很高，因此，这样进行复用，能够充分利用系统资源且不会对HDFS服务产生影响。Referring to FIG. 4, the HDFS name node (NN) service and LVS service are multiplexed for servers. For example, in FIG. 4, when there are two LVSs, two name nodes can also exist in the system at the same time. That is, deploy the data synchronization client on the name node, deploy FUSE to mount HDFS, and update data to HDFS through FUSE. During the download service process, the name node only responds to a small number of RPC requests within the cluster, and the load is very low; the LVS service is a CPU and network-intensive service, but its efficiency is high. Therefore, multiplexing in this way can make full use of system resources without It will affect the HDFS service.

另外，如图4所示，每个数据节点可以与一个Nginx复用一台服务器，也即，在每个数据节点(DN)上部署FUSE和Nginx提供下载服务。此时，每个数据节点中存储的数据都可以是全部的数据，并且Nginx的数目与数据节点的数目可以是相同的，也即，在一个集群内部的各个不同的下载节点之间，也可以实现数据的完全冗余。总之，这种机制可以进一步提高系统的可靠性，同时，得益于HDFS的特性，对HDFS数据的绝大多数请求都直接访问本机的HDFS服务，不会发生跨机访问，因此，可以提高响应速度，规避内网传输瓶颈。当然，在实际应用中，在不同的Nginx之间，也可以实现数据的部分冗余。In addition, as shown in Figure 4, each data node can multiplex a server with one Nginx, that is, FUSE and Nginx are deployed on each data node (DN) to provide download services. At this time, the data stored in each data node can be all data, and the number of Nginx and the number of data nodes can be the same, that is, between different download nodes within a cluster, it can also be Achieve complete redundancy of data. In short, this mechanism can further improve the reliability of the system. At the same time, thanks to the characteristics of HDFS, most of the requests for HDFS data directly access the local HDFS service, and no cross-machine access will occur. Therefore, it can be improved. Response speed, avoiding intranet transmission bottlenecks. Of course, in practical applications, partial redundancy of data can also be achieved between different Nginxes.

需要说明的是，LVS具有两种路由模式，一种是NAT模式，另一种是DR(Direct Routing)模式。在本发明实施例中，LVS可以工作于DR模式，在这种模式下，入流量(用户请求)通过LVS，但出流量(响应数据)可以直接从后面的数据节点返回，不再流经LVS。这样可以避免NAT模式中，出入流量都走LVS，使得LVS成为网络带宽瓶颈的问题。It should be noted that LVS has two routing modes, one is NAT mode and the other is DR (Direct Routing) mode. In the embodiment of the present invention, LVS can work in DR mode. In this mode, the incoming traffic (user request) passes through the LVS, but the outgoing traffic (response data) can be directly returned from the following data nodes and no longer flows through the LVS. . This can avoid the problem that in NAT mode, all incoming and outgoing traffic goes through LVS, making LVS a bottleneck of network bandwidth.

在具体实现时，为了更好地适应下载服务高并发、低延迟、高数据吞吐量的要求，还可以对Nginx参数以及HDFS服务进行调整。In the specific implementation, in order to better meet the requirements of high concurrency, low latency, and high data throughput of the download service, Nginx parameters and HDFS services can also be adjusted.

其中，在调整Nginx参数时，可以包括以下方面：(a)关闭Sendfile选项；(b)将worker进程数从默认的1增加为256；(c)将单进程最大连接数调整为512；(d)将backlog参数调整为204800；(e)将output-buffers设置为“1128k”。在调整HDFS服务时，可以包括下方面：(a)原read函数publicsynchronized int read(byte buf[]，int off，int len)throws IOException包括三个参数：Buf(缓存)、Off(数据在buf偏移)、Len(读取长度)，本发明实施例可以在此基础上增加一个参数position(文件偏移)，以此来减少数据预读，降低读请求压力；(b)将IPC server listen队列长度由128增加为1024；(c)IPC Server工作线程数增加由3为1024；(d)数据传输最大线程数由256增加为4096。当然，在实际应用中，也可以根据实际需要，调整其中的一部分参数，或者将这些参数的具体数值调整为其他的值，或者调整其他的参数，等等，这里不进行限定。Among them, when adjusting Nginx parameters, the following aspects can be included: (a) close the Sendfile option; (b) increase the number of worker processes from the default 1 to 256; (c) adjust the maximum number of connections in a single process to 512; (d ) adjust the backlog parameter to 204800; (e) set the output-buffers to "1128k". When adjusting HDFS services, the following aspects can be included: (a) The original read function publicsynchronized int read(byte buf[], int off, int len) throws IOException includes three parameters: Buf (cache), Off (data in buf bias Shift), Len (read length), the embodiment of the present invention can increase a parameter position (file offset) on this basis, reduces data read-ahead with this, reduces read request pressure; (b) IPC server listen queue The length is increased from 128 to 1024; (c) the number of IPC Server worker threads is increased from 3 to 1024; (d) the maximum number of data transmission threads is increased from 256 to 4096. Of course, in practical applications, some of the parameters can also be adjusted according to actual needs, or the specific values of these parameters can be adjusted to other values, or other parameters can be adjusted, etc., which are not limited here.

与本发明实施例提供的数据下载系统相对应，本发明实施例还提供了一种数据下载方法，该方法应用于图3所示的数据下载系统中，参见图5，该方法包括以下步骤：Corresponding to the data download system provided in the embodiment of the present invention, the embodiment of the present invention also provides a data download method, which is applied to the data download system shown in Figure 3, referring to Figure 5, the method includes the following steps:

S501：通过所述LVS接收用户的下载请求，并由所述LVS对各个Nginx进行调度选择，将所述下载请求转发给选中的Nginx；S501: Receive a user's download request through the LVS, and the LVS schedules and selects each Nginx, and forwards the download request to the selected Nginx;

S502：Nginx在接收到LVS转发的下载请求后，通过FUSE访问HDFS中存储的数据，响应用户的下载请求。S502: After receiving the download request forwarded by the LVS, Nginx accesses the data stored in the HDFS through FUSE, and responds to the user's download request.

其中，在具体实现时，LVS具体可以在接收到用户的下载请求后，根据各个Nginx的性能和/或当前负载状况对各个Nginx进行调度选择，将所述下载请求发送给性能和/或当前负载状态符合预置条件的Nginx。另外，系统中的LVS为两个，并且，两个LVS可以互为主备，每个LVS通过一个处于主模式的虚IP向用户提供下载服务，同时存在一个处于备用模式的虚IP；当一个LVS无法提供服务时，另一个LVS通过启动所述处于备用模式的虚IP来接管该LVS的下载服务。Wherein, during specific implementation, LVS can specifically schedule and select each Nginx according to the performance and/or current load status of each Nginx after receiving the user's download request, and send the download request to the performance and/or current load condition. Nginx whose status meets the preconditions. In addition, there are two LVSs in the system, and the two LVSs can be mutually active and standby. Each LVS provides download services to users through a virtual IP in the main mode, and there is a virtual IP in the standby mode at the same time; when a When the LVS fails to provide the service, another LVS takes over the download service of the LVS by starting the virtual IP in standby mode.

为了实现硬件资源的充分利用，降低系统规模，提高系统的维护性，HDFS的名字节点可以与LVS复用一台服务器，HDFS的每个数据节点与一个Nginx复用一台服务器。并且，Nginx的数目与数据节点的数目可以是相同的。In order to make full use of hardware resources, reduce system scale, and improve system maintainability, the name node of HDFS and LVS can multiplex a server, and each data node of HDFS can multiplex a server with an Nginx. Moreover, the number of Nginx and the number of data nodes can be the same.

另外，在实际应用中，为了更好地适应下载服务高并发、低延迟、高数据吞吐量的要求，还可以对Nginx参数以及HDFS服务进行调整。In addition, in practical applications, in order to better meet the requirements of high concurrency, low latency, and high data throughput of download services, Nginx parameters and HDFS services can also be adjusted.

通过以上的实施方式的描述可知，本领域的技术人员可以清楚地了解到本发明可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例或者实施例的某些部分所述的方法。It can be seen from the above description of the implementation manners that those skilled in the art can clearly understand that the present invention can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM, disk , CD, etc., including several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in various embodiments or some parts of the embodiments of the present invention.

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。尤其，对于装置或系统实施例而言，由于其基本相似于方法实施例，所以描述得比较简单，相关之处参见方法实施例的部分说明即可。以上所描述的装置及系统实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the device or system embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for relevant parts, refer to part of the description of the method embodiments. The device and system embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, It can be located in one place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

以上对本发明所提供的一种数据下载系统及方法，进行了详细介绍，本文中应用了具体个例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处。综上所述，本说明书内容不应理解为对本发明的限制。The data downloading system and method provided by the present invention have been introduced in detail above. In this paper, specific examples are used to illustrate the principle and implementation of the present invention. The description of the above embodiments is only used to help understand the present invention. method and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific implementation and application scope. In summary, the contents of this specification should not be construed as limiting the present invention.