CN108228743A

Movatterモバイル変換

Info

Publication number: CN108228743A
Application number: CN201711362882.6A
Authority: CN
Inventors: 张云翔; 饶竹; 饶竹一
Original assignee: Shenzhen Power Supply Bureau Co Ltd
Current assignee: Shenzhen Power Supply Bureau Co Ltd
Priority date: 2017-12-18
Filing date: 2017-12-18
Publication date: 2018-06-29

Abstract

The invention provides a real-time big data search engine system which is constructed by using Apache L ucene based on an HTTP protocol, and the architecture at least comprises a collector for collecting documents and data in various formats, an indexer for creating corresponding indexes according to analysis processing results of the documents, an index database for storing the indexes, an information resource library for collecting the documents with the indexes, a searcher for receiving query information input by an external user end and returning the query results, and a big data kernel for searching.

Description

Translated fromChinese

一种实时大数据搜索引擎系统A real-time big data search engine system

技术领域technical field

本发明涉及互联网技术领域，尤其涉及一种实时大数据搜索引擎系统。The invention relates to the technical field of the Internet, in particular to a real-time big data search engine system.

背景技术Background technique

随着信息化技术的发展，尤其是社交网络、移动互联、物联网、大数据应用的迅速崛起和普及，人类社会发展所产生的数据呈现爆炸式增长。如今全球每两天创造的数据就相当于自人类文明开始到 2003 年人类创造数据的总和，而且还在以每年 50％的速度增长。迅速膨胀的数据已经将人类带入到了崭新的“大数据”时代，数据已经成为与自然资源、人力资源同等重要的战略资源和生产要素。面对如此庞大的数据，如何从海量数据中快速获取需要的数据，并发掘需要的知识，是当今面临的一个挑战。With the development of information technology, especially the rapid rise and popularization of social networks, mobile Internet, Internet of Things, and big data applications, the data generated by the development of human society has shown explosive growth. Today, the data created every two days in the world is equivalent to the sum of the data created by humans from the beginning of human civilization to 2003, and it is still growing at a rate of 50% per year. The rapid expansion of data has brought mankind into a new era of "big data", and data has become an equally important strategic resource and production factor as natural resources and human resources. Faced with such a huge amount of data, how to quickly obtain the required data from the massive data and discover the required knowledge is a challenge we are facing today.

传统的网络应用系统架构，主要有C/S模式(或B/S)，S是指Server(服务器端)，B指Browser(浏览器端)，C指Client(客户端)，两者之间区别只在于主要业务逻辑是放在客户端还是放在服务器端。以C/S模式为例，客户端通过UI（操作界面）与用户交互产生的数据一般会通过网络方式提交给服务器进行业务处理，处理后的业务数据会存储在数据库或文件系统中，等待二次运用，比如数据查询、统计和数据挖掘等操作。该架构在大数据(通常指TB级的数据量)情况下，数据的分析处理瓶颈主要集中在数据库和文件系统的I/O，内存和CPU处理能力等，会导致系统响应太慢甚至无法响应，而且这种系统通常不具备可扩展性，增加存储和计算资源并不能提高其性能。The traditional network application system architecture mainly includes C/S mode (or B/S), S refers to Server (server side), B refers to Browser (browser side), C refers to Client (client side), and the The only difference is whether the main business logic is placed on the client side or on the server side. Taking the C/S mode as an example, the data generated by the client interacting with the user through the UI (operating interface) will generally be submitted to the server for business processing through the network, and the processed business data will be stored in the database or file system, waiting for two Secondary applications, such as data query, statistics, and data mining operations. In the case of big data (usually terabytes of data) in this architecture, the bottleneck of data analysis and processing is mainly concentrated in the I/O, memory and CPU processing power of the database and file system, which will cause the system to respond too slowly or even fail to respond. , and such systems are usually not scalable, and increasing storage and computing resources cannot improve their performance.

发明内容Contents of the invention

本发明所要解决的技术问题在于，提供一种实时大数据搜索引擎系统，能很好实现实时流数据的搜索功能。The technical problem to be solved by the present invention is to provide a real-time big data search engine system, which can well realize the search function of real-time streaming data.

为了解决上述技术问题，本发明提供一种实时大数据搜索引擎系统，In order to solve the above technical problems, the present invention provides a real-time big data search engine system,

采集器，采集来自于所述实时大数据搜索引擎系统外部的各种格式的文档和数据；Collector, collecting documents and data in various formats from the outside of the real-time big data search engine system;

索引器，对所述采集器所采集到的各种格式的文档以及数据库数据进行信息抽取，并根据文档类型选择相对应的文本分析器进行文本分析，创建各文档对应的索引；An indexer, which extracts information from documents in various formats and database data collected by the collector, and selects a corresponding text analyzer for text analysis according to the document type, and creates an index corresponding to each document;

索引库，收集并储存所述索引器产生的索引；an index repository, collecting and storing the indexes generated by the indexer;

信息资源库，集合带有索引的各文档，并与所述索引库中的对应索引建立关联；An information resource library, which collects each document with an index and establishes an association with a corresponding index in the index library;

搜索器，接收外界用户端输入的查询信息，并生成并传递搜索请求同时将搜索结果进行排序后反馈给所述外界用户端；The searcher receives the query information input by the external client, generates and transmits a search request, sorts the search results and feeds them back to the external client;

大数据内核，接收所述搜索器传递的所述搜索请求，在所述索引库内进行对应的索引检索并在所述信息资源库内提取对应的文件，同时反馈检索的结果和检索到的文件给到所述搜索器。The big data kernel receives the search request transmitted by the searcher, performs corresponding index retrieval in the index library and extracts the corresponding files in the information resource library, and feeds back the search results and the retrieved files at the same time given to the searcher.

在可选的实施例中，所述实时大数据搜索引擎系统由至少一台服务器实施。In an optional embodiment, the real-time big data search engine system is implemented by at least one server.

在可选的实施例中，所述索引器具体用于遵循阿帕奇Web服务器中的开放源代码全文检索引擎工具包创建所述各文档对应的索引。In an optional embodiment, the indexer is specifically configured to follow the open source full-text search engine toolkit in the Apache Web server to create an index corresponding to each document.

在可选的实施例中，所述搜索请求具体用于遵循阿帕奇Web服务器中的开放源代码全文检索引擎工具包定义的格式，并最终以超文本传输协议的方式进行传输。In an optional embodiment, the search request is specifically configured to follow the format defined by the open source full-text search engine toolkit in the Apache Web server, and finally transmit in the form of Hypertext Transfer Protocol.

在可选的实施例中，所述索引器在创建每个所述索引时，将该索引与所对应的文档的ID值进行映射操作；In an optional embodiment, when creating each index, the indexer performs a mapping operation between the index and the ID value of the corresponding document;

在查找索引时，所述实时大数据搜索引擎根据用户的输入信息检索所述索引库内的对应索引并映射出与该索引对应的所述文档的ID值即可返回查找结果及文档。When searching for an index, the real-time big data search engine retrieves the corresponding index in the index library according to the user's input information and maps the ID value of the document corresponding to the index to return the search result and the document.

在可选的实施例中，所述搜索请求包括关键词搜索、全文搜索以及关联搜索中至少一种。In an optional embodiment, the search request includes at least one of keyword search, full-text search and associated search.

在可选的实施例中，所述增加、修改的请求被所述实时大数据搜索引擎接收后且经用户进行确认提交后，在所述搜索器内才能搜索到所述索引及相关文件。In an optional embodiment, after the request for addition and modification is received by the real-time big data search engine and confirmed and submitted by the user, the index and related files can only be searched in the search engine.

在可选的实施例中，当所述删除请求包括ID值时，所述ID值指示删除包括对应ID值的文档，当所述删除请求包括查询索引时，所述查询索引指示删除根据所述查询索引搜到的所有对应文档。In an optional embodiment, when the delete request includes an ID value, the ID value indicates to delete the document including the corresponding ID value; when the delete request includes a query index, the query index indicates to delete the document according to the Query all corresponding documents found in the index.

在可选的实施例中，所述采集器所采集到的各种格式的文档均以可扩展标记语言（XML）形式存储。In an optional embodiment, the documents in various formats collected by the collector are stored in Extensible Markup Language (XML).

在可选的实施例中，具有可扩展的插件系统，通过各类插件完成更快速的数据处理和分析。In an optional embodiment, there is an extensible plug-in system, and various types of plug-ins are used to complete faster data processing and analysis.

在可选的实施例中，所述可扩展的插件包括IKAnalyzer、Mmseg4j，Paoding等分词器以及Solr_Pager分页工具。In an optional embodiment, the extensible plug-in includes tokenizers such as IKAnalyzer, Mmseg4j, and Paoding, and Solr_Pager paging tools.

本发明实施例的有益效果在于：The beneficial effects of the embodiments of the present invention are:

本发明的实时大数据搜索引擎系统，一方面，具有实时流数据的全文搜索和分布式计算功能，可提高数据分析处理的响应速度，适用于有超大数据集的应用程序；一方面，可由多台服务器以扩展的分布式架构能够实现，可便于服务器的动态部署，并通过增加硬件或者配置多个服务器来同时管理数据；再一方面，具有可扩展的插件体系，使得该实时大数据搜索引擎可更快速的处理和分析数据。The real-time big data search engine system of the present invention, on the one hand, has full-text search and distributed computing functions of real-time streaming data, can improve the response speed of data analysis and processing, and is suitable for applications with super large data sets; on the one hand, it can be used by multiple One server can be implemented with an extended distributed architecture, which facilitates the dynamic deployment of servers, and manages data at the same time by adding hardware or configuring multiple servers; Data can be processed and analyzed more quickly.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1是本发明一种实时大数据搜索引擎的一个实施例的功能架构示意图。FIG. 1 is a schematic diagram of the functional architecture of an embodiment of a real-time big data search engine of the present invention.

图2是本发明一种实时大数据搜索引擎的一个实施例的功能架构及工作流程图。Fig. 2 is a functional framework and a work flow chart of an embodiment of a real-time big data search engine of the present invention.

具体实施方式Detailed ways

以下各实施例的说明是参考附图，用以示例本发明可以用以实施的特定实施例。The following descriptions of various embodiments refer to the accompanying drawings to illustrate specific embodiments in which the present invention can be implemented.

本发明实施例提供一种实时大数据搜索引擎系统，例如为计算机、平板电脑、掌上电脑等智能设备。如图1和图2所示，本发明提供的实时大数据搜索引擎系统可包括：An embodiment of the present invention provides a real-time big data search engine system, such as smart devices such as computers, tablet computers, and palmtop computers. As shown in Figure 1 and Figure 2, the real-time big data search engine system provided by the present invention may include:

采集器1，采集来自于实时大数据搜索引擎系统外部的各种格式的文档和数据；在可选的实施例中，所述采集器1可为上述智能设备与数据源（未图示）连接的数据收发器，例如，USB接口、天线、显示屏等收发模块。所述数据源可为其他智能设备。所述各种格式的文档和数据可包括增加、修改、删除和查询等请求操作。具体实现中，用户可通过HPPT格式向所述采集器发送所述增加、修改、删除和查询等请求操作。The collector 1 collects documents and data in various formats from outside the real-time big data search engine system; in an optional embodiment, the collector 1 can be connected to the above-mentioned smart device and a data source (not shown) Data transceivers, such as USB interface, antenna, display and other transceiver modules. The data source can be other smart devices. The documents and data in various formats may include request operations such as adding, modifying, deleting, and querying. In a specific implementation, the user may send the request operations such as adding, modifying, deleting, and querying to the collector in HPPT format.

索引器2，对采集器1所采集到的各种格式的文档以及数据库数据进行信息抽取，并根据文档类型选择相对应的文本分析器进行文本分析，创建各文档对应的索引；在可选的实施例中，本发明的索引器2遵循阿帕奇Web服务器中的开放源代码全文检索引擎工具包（Apache Lucene）创建所述索引。The indexer 2 extracts information from documents in various formats and database data collected by the collector 1, and selects a corresponding text analyzer for text analysis according to the document type, and creates an index corresponding to each document; in the optional In the embodiment, the indexer 2 of the present invention follows the open-source full-text search engine toolkit (Apache Lucene) in the Apache Web server to create the index.

索引库5，收集并储存索引器2产生的索引；An index library 5, collecting and storing the indexes generated by the indexer 2;

信息资源库6，集合带有索引的各文档，并与索引库5中的对应索引建立关联；The information resource library 6 collects each document with an index and establishes an association with the corresponding index in the index library 5;

搜索器3，接收外界用户端输入的查询信息，并生成并传递搜索请求给大数据内核4，同时将大数据内核4的搜索结果进行排序后反馈给用户；在可选的实施例中，所述搜索请求包括关键词搜索、全文搜索以及关联搜索中至少一种，所述搜索请求遵循阿帕奇Web服务器中的开放源代码全文检索引擎工具包定义的格式，并最终以超文本传输协议（HTTP）的方式进行传输。The searcher 3 receives the query information input by the external client, generates and transmits a search request to the big data kernel 4, and at the same time sorts the search results of the big data kernel 4 and feeds them back to the user; in an optional embodiment, the Said search request comprises at least one of keyword search, full-text search and associated search, said search request follows the format defined by the open-source full-text search engine toolkit in the Apache Web server, and is finally transmitted in hypertext transfer protocol ( HTTP) for transmission.

大数据内核4，接收搜索器3传递的搜索请求，在索引库5内进行对应的索引检索并在信息资源库6内提取对应的文件，同时反馈检索的结果和检索到的文件给到搜索器3。The big data core 4 receives the search request transmitted by the searcher 3, performs corresponding index retrieval in the index library 5 and extracts the corresponding files in the information resource library 6, and feeds back the search results and the retrieved files to the searcher at the same time 3.

在可选的实施例中，索引器2，索引库5、搜索器3以及大数据内核4可为软件功能模块，这些软件功能模块可分布在不同的硬件模块（例如，多个DSP处理器）或不同的分布式服务器中实现，或者集中由一个中央处理单元（CPU）实现。In an optional embodiment, the indexer 2, the index library 5, the searcher 3 and the big data kernel 4 can be software functional modules, and these software functional modules can be distributed in different hardware modules (for example, multiple DSP processors) Or implemented in different distributed servers, or centralized by a central processing unit (CPU).

关于该实时大数据搜索引擎系统（例如，简称为：ROSE系统）如何开展实时大数据高速运算和处理工作进行说明，可参见图2的工作流程图。For an explanation of how the real-time big data search engine system (for example, ROSE system for short) performs real-time big data high-speed computing and processing, please refer to the work flow chart in FIG. 2 .

首先，该ROSE系统的采集器1从系统外部的数据源采集到各类的文档及数据库数据，并对其文档进行分类处理后，由索引器2针对不同类型的文档选择不同的文本分析器进行文本分析，并根据用户端常用的搜索习惯和关键字等方式创建各文档所对应的索引并放入到索引库5中。在本实施例中，本索引器2遵循阿帕奇Web服务器中的开放源代码全文检索引擎工具包（Apache Lucene）创建所述索引，创建索引的具体步骤如下所示：First, the collector 1 of the ROSE system collects various types of documents and database data from data sources outside the system, and after classifying the documents, the indexer 2 selects different text analyzers for different types of documents. Text analysis, and according to the user's common search habits and keywords, create an index corresponding to each document and put it into the index library 5. In this embodiment, the indexer 2 follows the open source full-text search engine toolkit (Apache Lucene) in the Apache Web server to create the index, and the specific steps for creating the index are as follows:

1、指定创建索引的目录；1. Specify the directory to create the index;

2、创建Directory对象；2. Create a Directory object;

3、创建写索引文件对象Index Writer；3. Create an index file object Index Writer;

4、获取源文件的File数组以确定索引内容；4. Obtain the File array of the source file to determine the index content;

5、用循环将每个文档写入索引，首先创建Document对象和Field对象，分别代表数据库表中的一行数据和该行中的列属性；然后将Field加入到Document中，最后由IndexWriter调用函数Add Document将文档索引写到索引数据库中；5. Use a loop to write each document into the index. First, create a Document object and a Field object, which respectively represent a row of data in the database table and the column attributes in the row; then add the Field to the Document, and finally call the function Add by IndexWriter Document writes the document index to the index database;

6、关闭写索引对象Index Writer。6. Close the write index object Index Writer.

创建好的索引入库，该索引对应的文档编上ID值收集在信息资源库6内，该索引与该索引对应的文档ID进行一一映射操作，以便后续通过索引找到对应的文档。The created index is stored in the database, and the ID value corresponding to the index is collected in the information resource library 6. The index and the document ID corresponding to the index are mapped one by one, so that the corresponding document can be found through the index later.

当然在创建索引的同时，系统内部也同时相应的创建了该索引对应的检索方式，其具体步骤如下所示：Of course, at the same time as creating the index, the corresponding retrieval method of the index is also created in the system. The specific steps are as follows:

1、创建读索引对象Index Reader；1. Create a reading index object Index Reader;

2、创建搜索对象Index Searcher；2. Create the search object Index Searcher;

3、创建词法分析对象Analyzer；3. Create a lexical analysis object Analyzer;

4、创建语法分析对象Query Parser4. Create a syntax analysis object Query Parser

5、Query Parser调用parser进行语法分析，生成查询语法树，将其放到Query中；5. Query Parser calls parser for grammatical analysis, generates a query syntax tree, and puts it in Query;

6、Index Searcher调用search方法对查询语法树Query进行搜索，得到结果集TopDocs；6. Index Searcher calls the search method to search the query syntax tree Query, and obtains the result set TopDocs;

7、根据TopDocs获取相应的ScoreDoc；7. Obtain the corresponding ScoreDoc according to TopDocs;

8、根据ScoreDoc获取相应Document文档；8. Obtain the corresponding Document according to ScoreDoc;

9、根据Document获取相应的Field属性。9. Obtain the corresponding Field attribute according to the Document.

按照以上步骤在系统内部运行检索，用户只需要输入查询的关键字，通过搜索器3向大数据内核4传递搜索请求（比如关键字搜索、全文搜索、关联搜索等）；再由大数据内核4根据搜索请求在索引库5内检索对应的索引检索，并把检索的结果和按照映射关系提取信息资源库内对应ID值的文件一起反馈给到搜索器3；进一步，由搜索器3对搜索结果进行排序后返回给用户端同时进行缓存操作。如前所述，所述搜索请求包括关键词搜索、全文搜索以及关联搜索中至少一种，所述搜索请求同样可遵循阿帕奇Web服务器中的开放源代码全文检索引擎工具包定义的格式，并最终以超文本传输协议（HTTP）的方式进行传输。According to the above steps to run the search within the system, the user only needs to input the keyword of the query, and transmit the search request (such as keyword search, full-text search, related search, etc.) to the big data kernel 4 through the searcher 3; then the big data kernel 4 Retrieve the corresponding index search in the index library 5 according to the search request, and feed back the search result and the file corresponding to the ID value in the information resource library according to the mapping relationship to the searcher 3; further, the search result is searched by the searcher 3 After sorting, it is returned to the client and cached at the same time. As mentioned above, the search request includes at least one of keyword search, full-text search and associated search, and the search request can also follow the format defined by the open source full-text search engine toolkit in the Apache Web server, And finally transmitted in the form of Hypertext Transfer Protocol (HTTP).

通过以上的所有步骤即可完成一次搜索。在本实施例中，采集器1采集到的各种格式的文档均以可扩展标记语言（XML）形式存储，同时也按照XML形式进行响应返回给用户。当然在其他可选实施例中，文档可存储为其他格式的文件，这并不影响用户端对其的查询搜索。A search can be completed through all the above steps. In this embodiment, the documents in various formats collected by the collector 1 are all stored in the form of extensible markup language (XML), and at the same time, the response is returned to the user in the form of XML. Of course, in other optional embodiments, the document may be stored as a file in other formats, which does not affect the query and search of the document by the user terminal.

为了实现更高效的利用分布式计算功能，在本实施例中该实时大数据搜索引擎系统可通过Zookeeper（开放源码的分布式应用程序服务软件）建立起至少一个服务器集群，共享分散在各用户端的服务器资源共同实现实时大数据搜索引擎功能，从而达到高速运算和存储的功能，提高响应速度。In order to realize more efficient utilization of distributed computing functions, in this embodiment, the real-time big data search engine system can establish at least one server cluster through Zookeeper (distributed application service software with open source code) to share distributed data at each client end. The server resources jointly realize the real-time big data search engine function, so as to achieve high-speed computing and storage functions, and improve response speed.

由于本发明的ROSE搜索引擎系统可建立在集群服务器的架构上，不同的用户均可对各自自有的文档进行相关操作，该引擎接收到相应的请求也将对其内部对应的索引和文档进行相关的操作，比如增加、修改、删除和查询等操作。Since the ROSE search engine system of the present invention can be built on the framework of cluster servers, different users can perform related operations on their own documents, and the engine will also perform corresponding operations on its internal corresponding indexes and documents when receiving corresponding requests. Related operations, such as adding, modifying, deleting, and querying.

在可选的实施例中，为了提高安全性，用户发出的增加和修改请求需要用户确认提交后，才能在搜索引擎系统内部增加和更新相应的索引和文档，此时搜索引擎系统才能搜索到新增的或者更新后的文档；用户通过输入ID值的方式只能删除对应ID 的文档，如果使用查询索引方式则可以删除掉查询结果返回的所有文档。In an optional embodiment, in order to improve security, the increase and modification requests sent by the user need to be confirmed and submitted by the user before the corresponding index and document can be added and updated in the search engine system. Added or updated documents; the user can only delete the document corresponding to the ID by entering the ID value. If the query index method is used, all documents returned by the query result can be deleted.

为了使该ROSE搜索引擎系统更加快速的处理分析数据，该搜索引擎系统开放了可扩展的插件系统，可安装例如IKAnalyzer、Mmseg4j，Paoding等分词器来实现中文分词功能，也可以安装Solr_Pager分页工具来实现搜索分页功能。当然在其他可选实施例中，基于HTTP和WEB的框架下可根据实际需求开放更多的插件来实现更多的功能。In order to make the ROSE search engine system process and analyze data more quickly, the search engine system has opened an extensible plug-in system, such as IKAnalyzer, Mmseg4j, Paoding and other word breakers can be installed to realize the Chinese word segmentation function, and the Solr_Pager paging tool can also be installed to Implement the search pagination function. Of course, in other optional embodiments, more plug-ins can be opened according to actual requirements under the framework based on HTTP and WEB to realize more functions.

以上的ROSE搜索引擎系统实现了实时流数据的全文搜索功能和分布式系统共同计算的工作方式，适合运用在有超大数据集的应用程序上。The above ROSE search engine system realizes the full-text search function of real-time streaming data and the working mode of joint computing of distributed systems, which is suitable for applications with very large data sets.

通过上述说明可知，本发明的有益效果在于：Can know by above description, beneficial effect of the present invention is:

采用实时大数据搜索引擎，一方面，具有实时流数据的全文搜索和分布式计算功能，可提高数据分析处理的响应速度，适用于有超大数据集的应用程序；一方面，可扩展的分布式计算架构能够实现动态部署，通过增加硬件或者配置多个服务器来同时管理数据；一方面，具有可扩展的插件体系，使得该实时大数据搜索引擎可更快速的处理和分析数据。Using a real-time big data search engine, on the one hand, it has the full-text search and distributed computing functions of real-time streaming data, which can improve the response speed of data analysis and processing, and is suitable for applications with large data sets; on the other hand, the scalable distributed The computing architecture can realize dynamic deployment, and manage data at the same time by adding hardware or configuring multiple servers; on the one hand, it has an expandable plug-in system, which enables the real-time big data search engine to process and analyze data more quickly.

以上所揭露的仅为本发明较佳实施例而已，当然不能以此来限定本发明之权利范围，因此依本发明权利要求所作的等同变化，仍属本发明所涵盖的范围。The above disclosures are only preferred embodiments of the present invention, and certainly cannot limit the scope of rights of the present invention. Therefore, equivalent changes made according to the claims of the present invention still fall within the scope of the present invention.

Claims

Translated fromChinese

1.一种实时大数据搜索引擎系统，其特征在于，包括：1. A real-time big data search engine system is characterized in that, comprising:

2.根据权利要求1所述的实时大数据搜索引擎系统，其特征在于，所述实时大数据搜索引擎系统由至少一台服务器实施。2. real-time big data search engine system according to claim 1, is characterized in that, described real-time big data search engine system is implemented by at least one server.

3.根据权利要求1或2所述的实时大数据搜索引擎系统，其特征在于，所述索引器具体用于遵循阿帕奇Web服务器中的开放源代码全文检索引擎工具包创建所述各文档对应的索引。3. according to the described real-time big data search engine system of claim 1 or 2, it is characterized in that, described indexer is specifically for following the open source code full-text search engine tool kit in the Apache Web server to create described each document the corresponding index.

4.根据权利要求1或2所述的实时大数据搜索引擎系统，其特征在于，所述搜索请求具体用于遵循阿帕奇Web服务器中的开放源代码全文检索引擎工具包定义的格式，并最终以超文本传输协议的方式进行传输。4. according to the described real-time big data search engine system of claim 1 or 2, it is characterized in that, described search request is specifically used to follow the format defined by the open source code full-text search engine toolkit in the Apache Web server, and Finally, it is transmitted in the form of hypertext transfer protocol.

5.根据权利要求1所述的实时大数据搜索引擎系统，其特征在于，所述索引器在创建每个所述索引时，将该索引与所对应的文档的ID值进行映射操作；5. the real-time big data search engine system according to claim 1, is characterized in that, when described indexer creates each described index, this index and the ID value of corresponding document carry out mapping operation;

6.根据权利要求4所述的实时大数据搜索引擎系统，其特征在于，所述搜索请求包括关键词搜索、全文搜索以及关联搜索中至少一种。6. The real-time big data search engine system according to claim 4, wherein the search request includes at least one of keyword search, full-text search and associated search.

7.根据权利要求6所述的实时大数据搜索引擎系统，其特征在于，所述增加、修改的请求被所述实时大数据搜索引擎接收后且经用户进行确认提交后，在所述搜索器内才能搜索到所述索引及相关文件。7. real-time big data search engine system according to claim 6, is characterized in that, after the request of described increase, modification is received by described real-time big data search engine and after user confirms and submits, in described searcher The index and related documents can only be searched within.

8.根据权利要求6所述的实时大数据搜索引擎系统，其特征在于，当所述删除请求包括ID值时，所述ID值指示删除包括对应ID值的文档，当所述删除请求包括查询索引时，所述查询索引指示删除根据所述查询索引搜到的所有对应文档。8. The real-time big data search engine system according to claim 6, wherein when the deletion request includes an ID value, the ID value indicates that deletion includes a document corresponding to the ID value, and when the deletion request includes a query When indexing, the query index indicates to delete all corresponding documents searched according to the query index.

9.根据权利要求1所述的实时大数据搜索引擎系统，其特征在于，所述采集器所采集到的各种格式的文档均以可扩展标记语言（XML）形式存储。9. The real-time big data search engine system according to claim 1, wherein the documents in various formats collected by the collector are all stored in the form of Extensible Markup Language (XML).

10.根据权利要求1所述的实时大数据搜索引擎系统，其特征在于，具有可扩展的插件系统，通过各类插件完成更快速的数据处理和分析，所述可扩展的插件包括IKAnalyzer、Mmseg4j、Paoding分词器以及Solr_Pager分页工具。10. The real-time big data search engine system according to claim 1, characterized in that, it has an extensible plug-in system, and completes faster data processing and analysis through various plug-ins, and said extensible plug-in includes IKAnalyzer, Mmseg4j , Paoding tokenizer and Solr_Pager paging tool.