CN108763562A

Movatterモバイル変換

Info

Publication number: CN108763562A
Application number: CN201810563153.5A
Authority: CN
Inventors: 王济平; 黎刚; 周健雄; 汤克云
Original assignee: Guangdong Jingxin Software Technology Co ltd
Current assignee: Guangdong Jingxin Software Technology Co ltd
Priority date: 2018-06-04
Filing date: 2018-06-04
Publication date: 2018-11-06

Abstract

The invention provides a construction method for improving data exchange efficiency based on a big data technology, which is characterized by comprising the following steps: 1) constructing a central end; establishing a central database, wherein the central database comprises a basic information base based on RDBMS and an exchange database based on HDFS; establishing a bridge interface, wherein the bridge interface comprises an acquisition service program and a distribution service program based on a flash system and a Kafka system, the flash is used for collecting data of various sources and forms, transmitting the data to a Kafka cluster, and uniformly distributing the data to a big data cluster by the Kafka cluster for processing; 2) a data provider or a data user constructing step; establishing a service system, a service library and a bridging interface; 3) constructing a data user side; and establishing a service system, a service library and a bridging interface. The invention can effectively expand the type of the storable data, and meanwhile, the distributed technology can greatly improve the efficiency of data reading and writing, so that the platform obtains higher throughput and reliability, thereby being capable of processing massive data and data exchange.

Description

Translated fromChinese

一种基于大数据技术提升数据交换效率的构建方法A construction method based on big data technology to improve data exchange efficiency

技术领域technical field

本发明涉及大数据技术，具体涉及一种基于大数据技术提升数据交换效率的构建方法。The invention relates to big data technology, in particular to a construction method for improving data exchange efficiency based on big data technology.

背景技术Background technique

在传统的数据交换平台实现交换时，一般包含中心节点、前置交换节点、桥接程序，每一个交换节点包括了一组输入接口和输出接口，通过接口可以向交换节点写入数据或读取数据，在每一个端节点前置机上带有一个前置交换库，用于存放交换数据，参见附图1。此过程一条数据的交换会有3次写入与3次读取的过程，同时这个过程也会受载体（服务器资源）或系统性能（数据库的IO速度）因素的影响，当数据量不多时，多次的写入与读取不会有明显的效率影响，但数据交换量到了一定的级别，如单次交换上亿条数据，那么就会直接影响到交换效率，而这种交换效率的影响不能单纯通过硬件的升级进行优化提升，需要架构进行调整优化。When the traditional data exchange platform realizes the exchange, it generally includes a central node, a front-end exchange node, and a bridge program. Each exchange node includes a set of input interfaces and output interfaces. Through the interface, data can be written to or read from the exchange node. , each end node front-end processor has a front-end exchange library for storing exchange data, see Figure 1. In this process, the exchange of a piece of data will have 3 writes and 3 reads. At the same time, this process will also be affected by factors such as carrier (server resources) or system performance (database IO speed). When the amount of data is small, Multiple writes and reads will not have an obvious impact on efficiency, but the amount of data exchange reaches a certain level, such as a single exchange of hundreds of millions of data, it will directly affect the exchange efficiency, and the impact of this exchange efficiency It cannot be optimized and improved simply through hardware upgrades, but the architecture needs to be adjusted and optimized.

随着社会的发展，数据量的级别不断在提升，传统的技术已无法满足对海量数据处理的需求，例如，现有政务数据交换平台主要是基于ETL技术实现，在数据的抽取、转换及加载过程中会因软硬件资源的配置而受影响，另外数据交换平台在建设初期数据交换量不多的情况下，一般都是采用单节点的设计，这种设计在数据量及数据交换任务不多的情况下可正常运行，然而随着国家把大数据上升为国家战略层面，要求各省市都需要实现数据的共享、交换与开放，导致业务系统数据量激增，数据交换越来越频繁。这种单节点设计的弊端逐渐暴露出来，在数据交换高峰时，由于单台服务器或传统技术性能的限制，许多数据交换任务来不及处理，造成大量数据交换任务的积压，使数据交换平台逐渐成为信息化建设的瓶颈。With the development of society, the level of data volume is constantly increasing, and traditional technology can no longer meet the demand for massive data processing. For example, the existing government data exchange platform is mainly based on ETL technology. In data extraction, conversion and loading The process will be affected by the configuration of software and hardware resources. In addition, when the data exchange volume is not much in the early stage of construction, the data exchange platform generally adopts a single-node design. However, as the country raises big data to a national strategic level, all provinces and cities are required to realize data sharing, exchange and openness, resulting in a sharp increase in the amount of data in the business system and more frequent data exchanges. The disadvantages of this single-node design are gradually exposed. At the peak of data exchange, due to the performance limitations of a single server or traditional technology, many data exchange tasks are too late to process, resulting in a backlog of a large number of data exchange tasks, making the data exchange platform gradually become an information platform. The bottleneck of the construction of the system.

面对越来越多的数据存量及交换场景，数据交换平台迫切需要寻找一个可提供交换效率的有效方法。Faced with more and more data storage and exchange scenarios, the data exchange platform urgently needs to find an effective method that can improve exchange efficiency.

发明内容Contents of the invention

为克服现有技术中存的缺陷，本发明提出一种基于大数据技术提升数据交换效率的构建方法，目的在于突破现有的交换瓶颈，提高整体的交换效率，其具体技术内容如下：In order to overcome the defects in the existing technology, the present invention proposes a construction method based on big data technology to improve data exchange efficiency. The purpose is to break through the existing exchange bottleneck and improve the overall exchange efficiency. The specific technical content is as follows:

一种基于大数据技术提升数据交换效率的构建方法，其包括：A construction method for improving data exchange efficiency based on big data technology, including:

1）中心端的构建步骤1) Construction steps of the central terminal

建立中心数据库，其中包括基于RDBMS的基础信息库和基于HDFS的交换数据库；Establish a central database, including the basic information base based on RDBMS and the exchange database based on HDFS;

建立桥接接口，其中包括基于Flume和Kafka系统的采集服务程序、分发服务程序，Flume用于收集各种来源、形式的数据，并把数据传给Kafka集群，由Kafka集群统一分发给大数据集群进行处理；Establish a bridging interface, including the collection service program and distribution service program based on Flume and Kafka systems. Flume is used to collect data from various sources and forms, and transmit the data to the Kafka cluster, which is uniformly distributed by the Kafka cluster to the big data cluster. deal with;

2）数据提供方或数据使用方的构建步骤2) Construction steps of data provider or data user

建立业务系统、业务库和桥接接口；Establish business systems, business libraries and bridging interfaces;

3）数据使用方的构建步骤3) Construction steps of the data user

建立业务系统、业务库和桥接接口。Establish business systems, business libraries and bridging interfaces.

作为优选，所述基础信息库为基于Mysql或/和Qracle的关系型数据库，所述换数据库为基于Hbase或/和MangoDB的非关系型数据库。Preferably, the basic information database is a relational database based on Mysql or/and Qracle, and the exchange database is a non-relational database based on Hbase or/and MangoDB.

本发明的有益效果是：基于大数据技术的数据交换模式在中心库上采用分布式的技术，支持Nosql数据库及关系型的数据库，可有效扩展可存储的数据类型，同时分布式的技术可大大提升数据读取与写入的效率。在数据提供方及使用方上，将通过接口程序可直接与中心数据库进行对接，中心数据库提供Flume及Kafka的方式进行数据的交换对接，整个数据交换的过程只有1次读取和1次写入的操作，可有效提高整体的数据交换效率，使平台得到较高的吞吐量和可靠性，从而可处理海量的数据及数据交换任务，整体交换效率是传统基于ETL交换平台的2倍以上，同时本技术还具有以下的优势：The beneficial effects of the present invention are: the data exchange mode based on big data technology adopts distributed technology on the central library, supports Nosql database and relational database, can effectively expand the data types that can be stored, and at the same time, the distributed technology can greatly Improve the efficiency of data reading and writing. On the data provider and user side, the interface program can be directly connected to the central database. The central database provides Flume and Kafka for data exchange and connection. The entire data exchange process is only 1 read and 1 write The operation can effectively improve the overall data exchange efficiency, so that the platform can obtain higher throughput and reliability, so that it can handle massive data and data exchange tasks, and the overall exchange efficiency is more than twice that of the traditional ETL-based exchange platform. This technology also has the following advantages:

1）高性能：分布式技术、Flume、Kafka等大数据的应用技术可为平台提供高吞吐量和高可靠性，有效支撑海量数据的处理。1) High performance: Distributed technology, Flume, Kafka and other big data application technologies can provide the platform with high throughput and high reliability, and effectively support the processing of massive data.

2）易扩展：可预估并且弹性扩展计算、存储容量和性能。2) Easy to expand: predictable and elastic expansion of computing, storage capacity and performance.

3）低成本：分布式存储只需要IP网络，几台X86服务器加内置硬盘就可以组建起来，初期成本比较低。3) Low cost: Distributed storage only needs an IP network, and several X86 servers plus built-in hard disks can be set up, and the initial cost is relatively low.

附图说明Description of drawings

图1为现有技术中的数据交换平台原理框架图。Fig. 1 is a schematic frame diagram of a data exchange platform in the prior art.

图2为由本发明方法所构建的大数据交换平台原理框架图。Fig. 2 is a schematic frame diagram of a big data exchange platform constructed by the method of the present invention.

具体实施方式Detailed ways

如下结合附图2，对本申请方案作进一步描述：The scheme of this application is further described in conjunction with accompanying drawing 2 as follows:

1）中心端的构建步骤1) Construction steps of the central terminal

建立中心数据库，其中包括基于RDBMS（Relational Database Management System，关系数据库管理系统）的基础信息库和基于HDFS（Hadoop分布式文件系统）的交换数据库；Establish a central database, including the basic information base based on RDBMS (Relational Database Management System, relational database management system) and the exchange database based on HDFS (Hadoop Distributed File System);

建立桥接接口，其中包括基于Flume和Kafka系统的采集服务程序、分发服务程序，Flume用于收集各种来源、形式的数据，并把数据传给Kafka集群，由Kafka集群统一分发给大数据集群进行处理；具体的，所述基础信息库为基于Mysql或/和Qracle的关系型数据库，所述换数据库为基于Hbase或/和MangoDB的非关系型数据库；Establish a bridging interface, including the collection service program and distribution service program based on Flume and Kafka systems. Flume is used to collect data from various sources and forms, and transmit the data to the Kafka cluster, which is uniformly distributed by the Kafka cluster to the big data cluster. Processing; Specifically, the basic information base is a relational database based on Mysql or/and Qracle, and the exchange database is a non-relational database based on Hbase or/and MangoDB;

3）数据使用方的构建步骤3) Construction steps of the data user

在数据库技术上，本发明采用关系型数据库及nosql数据库相结合的方式，可有效扩展可存储的数据类型。关系型数据库可支持主流的Mysql、Qracle及达梦等，Nosql数据库主要采用HDSF+Hbase+MangoDB的方式，可有效解决大数据存储及快速读取的需求。In terms of database technology, the present invention adopts a combination of relational database and nosql database, which can effectively expand the data types that can be stored. Relational databases can support mainstream Mysql, Qracle and Dameng, etc. Nosql databases mainly adopt the method of HDSF+Hbase+MangoDB, which can effectively solve the needs of large data storage and fast reading.

在消息处理上，本技术采用Flume+Kafka的方式，在本数据交换环境架构中，Flume主要用来收集各种来源、形式的数据，并把数据传给Kafka集群，由Kafka集群统一分发给大数据集群进行处理。采用Flume+Kafka的组合方式一是因为Flume可支持在日志系统中定制各类数据发送方，用于收集数据；同时，Flume提供对数据进行简单处理，并写到各种数据接受方(可定制)的能力。二是kafka实际上是一个消息发布订阅系统，Producer向某个Topic发布消息，而Consumer订阅某个Topic的消息。一旦有新的关于某个Topic的消息，Broker会传递给订阅它的所有Consumer。在实际使用中采用Flume作为数据的生产者，这样可以不用编程就实现数据源的引入，并采用Kafka Sink作为数据的消费者，这样可以得到较高的吞吐量和可靠性。通过上述两种核心技术的实现，一是可有效对海量任务的及时处理及分配，二是可有效提高对海量数据交换的处理效率，从而提高数据交换的整体效率。In terms of message processing, this technology adopts the method of Flume+Kafka. In this data exchange environment architecture, Flume is mainly used to collect data from various sources and forms, and transmit the data to the Kafka cluster. Data clusters are processed. The combination of Flume+Kafka is adopted. First, because Flume can support customizing various data senders in the log system for data collection; at the same time, Flume provides simple data processing and writes to various data receivers (customizable )Ability. The second is that Kafka is actually a message publishing and subscription system. Producer publishes a message to a Topic, and Consumer subscribes to a Topic message. Once there is a new message about a Topic, the Broker will pass it on to all Consumers who subscribe to it. In actual use, Flume is used as the producer of data, so that the introduction of data sources can be realized without programming, and Kafka Sink is used as the consumer of data, so that higher throughput and reliability can be obtained. Through the realization of the above two core technologies, one can effectively process and distribute massive tasks in a timely manner, and the other can effectively improve the processing efficiency of massive data exchange, thereby improving the overall efficiency of data exchange.

上述优选实施方式应视为本申请方案实施方式的举例说明，凡与本申请方案雷同、近似或以此为基础作出的技术推演、替换、改进等，均应视为本专利的保护范围。The above-mentioned preferred implementation mode should be regarded as an illustration of the implementation mode of the scheme of this application, and any technical deduction, replacement, improvement, etc. that are similar to, similar to, or based on the scheme of this application should be regarded as the scope of protection of this patent.

Claims

Translated fromChinese

1.一种基于大数据技术提升数据交换效率的构建方法，其特征在于，包括：1. A construction method for improving data exchange efficiency based on big data technology, characterized in that, comprising:

1)中心端的构建步骤1) Construction steps of the center terminal

2)数据提供方或数据使用方的构建步骤2) Construction steps of data provider or data user

3)数据使用方的构建步骤3) Construction steps of the data user

2.根据权利要求1所述的基于大数据技术提升数据交换效率的构建方法，其特征在于：所述基础信息库为基于Mysql或/和Qracle的关系型数据库，所述换数据库为基于Hbase或/和MangoDB的非关系型数据库。2. the construction method based on big data technology promotion data exchange efficiency according to claim 1, is characterized in that: described basic information storehouse is the relational database based on Mysql or/and Qracle, and described exchange database is based on Hbase or / and MangoDB for non-relational databases.