CN106326429A

Movatterモバイル変換

Info

Publication number: CN106326429A
Application number: CN201610723701.7A
Authority: CN
Inventors: 童浩; 杨凡
Original assignee: Wuhan Optics Valley Information Technologies Co Ltd
Current assignee: Wuhan Optics Valley Information Technologies Co Ltd
Priority date: 2016-08-25
Filing date: 2016-08-25
Publication date: 2017-01-11

Abstract

The invention discloses an Hbase second-level query scheme based on solr. The Hbase second-level query scheme comprises the following steps of inserting raw data into an Hbase column-oriented database; calling a MapReduce increment to update an index in the solr, obtaining the raw data, and storing into a server of the solr with a particular file format of the solr; accessing the server of the solr, and establishing the index; firstly, searching the index, obtaining rowkey from the index, and querying required result data from an Hbase main list. The Hbase second-level query scheme has the advantages that the searching speed is high, and the accuracy is high; by adopting a solr and Hbase combining technique, the massive data can be searched in a second-level way, and the rowkey of data of one page can be returned back by a page separating function of the solr; because the number of data of each page is extremely limited, the response speed is higher when the Hbase query is performed according to the rowkey of the corresponding page, and is controlled to the millisecond level.

Description

A kind of Hbase second level query scheme based on solr

Technical field

The present invention relates to hbase technical field, particularly relate to a kind of Hbase second level query scheme based on solr.

Background technology

Solr is a complete search service based on lucene under apache.Solr mainly includes two parts coreAssembly: indexing component and searching component.Indexing component is for setting up index by the data needing index in search utility, and searchesRope assembly carrys out search index for the request of customer in response end.Solr is a high-performance, uses Java5 exploitation, based onThe full-text search server of Lucene.It is extended, it is provided that the ratio query language of Lucene more horn of plenty simultaneously, withTime achieve configurable, expansible and query performance be optimized, and provide a perfect function management interface,It it is the most outstanding a full-text search engine.Document utilizes XML to be added in a search set by Http.Inquire about this setAlso it is to receive an XML/JSON response by http to realize.Its key property includes: efficiently, caching function flexibly,Vertical search function, is highlighted Search Results, improves availability by index copy, it is provided that a set of powerful DataSchema defines field, type and arrange text analyzing, it is provided that Web-based enterprise management interface etc..

Hbase is the Hadoop family distributed storage scheme for mass data, when us by rowkey to being stored inThe response of second level can be reached, it is achieved more satisfactory Consumer's Experience when mass data in Hbase is inquired about.But, whenUnder more complicated scene, if desired for when data are done multi-condition inquiry, the solution that Hbase provides is not the most to manage very muchThink.

For multi-condition inquiry, there are two kinds of solutions comparing main flow Hbase present stage itself:

1, table is manually indexed by coprocessor when inserting data

Coprocessor in Hbase has two kinds: Observer and Endpoint.Observer is similar to relevant databaseIn trigger, Endpoint is similar to the storing process in relevant database.

We use Observer when utilizing coprocessor to index table, are i.e. inserting data in Hbase tableTime, add Observer operation, allow and before often inserting a data, all call our self-defining service logic life in concordance listBecome to need the record of index field.

So when we carry out multi-condition inquiry for Hbase, our inquiry operation is divided into two steps: the first step is firstInquiring about at concordance list according to querying condition, the rowkey of the corresponding result of inquiry, second step goes master meter to look into further according to rowkeyAsk the data that we need.

This scheme has several bigger problem:

(1) coprocessor is the most unstable

In existing version Hbase, when our oneself test generates index by coprocessor, once setting up Index processMiddle code throw exception, whole Hadoop cluster all can be hung.

(2) index can affect insert data speed

Owing to inserting data and to index be a Tong Bus process, so shadow to a great extent is understood in the operation indexedRing the speed inserting data.

(3) field needing index must determine before data are inserted, and the later stage can not revise

Inserting another problem of simultaneously indexing of data is exactly that we must disposably determine and be there is a need to set up ropeThe field drawn, if the later stage need in a new field set up index, before already inserted into data be will not the most againSet up index.

(4) the corresponding concordance list of each index field is inefficient

In order to flexible when the later stage makes index of reference, typically one can be set up for each single field when setting up concordance listConcordance list.Using field value as the rowkey of concordance list, using the rowkey of former table as the field of concordance list.This modeAlthough us can be facilitated to do multi-condition inquiry flexibly, but the quantity of concordance list can be increased, looking into when word enquiring simultaneously simultaneouslyWhen inquiry condition is more, needs the concordance list inquiry operation carried out repeatedly, the response inquired about also is had and compares large effect.

2, the filter using Hbase to carry filters in service end

Hbase carries number of types of filter, and we can also oneself filter self-defined simultaneously.When we are looking intoUsing filter when of inquiry, the result data of inquiry can be carried out by the logic of filter by Hbase in the service end of clusterFilter.

But same, this scheme also has a problem in that filter still needs scan data, and efficiency is low.

Although filter is to filter in service end, but still need all numbers meeting rowkey querying conditionAccording to all checking out, it is scanned in these data the most again, filters out the data not meeting filtercondition.This processCan take a lot of service end internal memory when original query data volume is bigger, sweep time also can be the longest simultaneously, this mistake of lightThe time-consuming requirement that the most can not reach the inquiry of second level of journey.

There is some characteristic can not meet our demand based on both the above scheme, we have proposed a kind of based on solrHbase second level query scheme.

Summary of the invention

The invention aims to solve shortcoming present in prior art, and propose a kind of based on solrHbase second level query scheme.

A kind of Hbase second level query scheme based on solr, comprises the following steps:

Step 1, initial data is inserted in Hbase columnar database, keep the original mode of Hbase, be not required to do otherWhat change；

Step 2, obtain initial data and initial data is stored in the distinctive document format of solr the service end of solr,After setting up document, document can be analyzed by solr automatically, after completing analysis, solr using the word that is syncopated as key, withDocument carries out inverted index as value, i.e. forms index, and the rope set up in MapReduce incremental update solr is called in timingDraw；

When step 3, inquiry, access solr service end, need individually to set up in the field inquired about index, search index,From index, obtain rowkey, go Hbase columnar database is inquired about further according to rowkey, i.e. generate required number of resultsAccording to.

Preferably, after described solr sets up index, index compression can be stored in the disk of solr service end, simultaneouslyMap can be utilized to do the caching of part.

Preferably, segmenter can be optimized, for business scenario to being customized of participle by described solr indexOptimization, extract the special word of industry.

Preferably, described solr carries two-page separation function, can return the rowkey of page of data every time.

Preferably, described sorl can combine with ripe memory database, is directly existed in memory database by index.

Preferably, described solr sets up the operation indexed and can also be placed in the coprocessor of Hbase execution.

A kind of based on solr Hbase second level query scheme that the present invention proposes, search speed is fast, and accuracy rate is high, passes throughThe technology that solr and hbase combines, it is achieved retrieving the second level of mass data, the two-page separation function that solr carries can be returned every timeReturn the rowkey of page of data, owing to the quantity of every page data is extremely limited, so rowkey based on this page goes Hbase to look into againDuring inquiry, response speed is very fast, can be controlled in Millisecond.

Accompanying drawing explanation

Fig. 1 is data Stored Procedure figures；

Fig. 2 is data query flow chart.

Detailed description of the invention

Below in conjunction with specific embodiment, the present invention is explained orally further.

With reference to Fig. 1-2, a kind of based on solr Hbase second level query scheme that the present invention proposes, comprise the following steps:

Step 2, timing are called in MapReduce incremental update solr and are indexed, and first obtain and insert in Hbase columnar databaseInitial data and initial data is stored in the server of solr with the distinctive document format of solr, set up solr after documentAutomatically document can be analyzed, relate among these by specific participle technique, the content in document is carried out participle, complete pointAfter word, solr, using the word that is syncopated as key, carries out inverted index using document as value；

When step 3, inquiry, access solr service end, the field needing inquiry is individually set up index, set up indexAfter, index compression can be stored in the disk of solr service end by solr, Map can be utilized simultaneously to do the caching of part, inquire about ropeDraw, from index, obtain rowkey, solr carry two-page separation function, the rowkey of page of data can be returned every time, further according toRowkey goes to inquire about in Hbase columnar database, i.e. generates required result data.

In the present invention solr set up index operation can also be placed in the coprocessor of Hbase execution, sorl can with becomeRipe memory database combines, and is directly existed in memory database by index.

The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto,Any those familiar with the art in the technical scope that the invention discloses, according to technical scheme andInventive concept equivalent or change in addition, all should contain within protection scope of the present invention.

Claims

1. a Hbase second level query scheme based on solr, it is characterised in that comprise the following steps:

Step 1, initial data is inserted in Hbase columnar database, keep the original mode of Hbase, be not required to do other any moreChange；

Step 2, obtain initial data and initial data is stored in the distinctive document format of solr the service end of solr, setting upAfter document, document can be analyzed by solr automatically, and after completing analysis, solr is using the word that is syncopated as key, with documentCarrying out inverted index as value, i.e. form index, the index set up in MapReduce incremental update solr is called in timing；

When step 3, inquiry, accessing solr service end, individually set up index in the field needing inquiry, search index, from ropeDraw middle acquisition rowkey, go Hbase columnar database is inquired about further according to rowkey, i.e. generate required result data.

A kind of Hbase second level query scheme based on solr the most according to claim 1, it is characterised in that described solrAfter setting up index, index compression can be stored in the disk of solr service end, Map can be utilized simultaneously to do the caching of part.

A kind of Hbase second level query scheme based on solr the most according to claim 1, it is characterised in that described solrSegmenter can be optimized by index, for the business scenario optimization to being customized of participle, extracts the special use of industryWord.

A kind of Hbase second level query scheme based on solr the most according to claim 1, it is characterised in that described solrCarry two-page separation function, the rowkey of page of data can be returned every time.

A kind of Hbase second level query scheme based on solr the most according to claim 1, it is characterised in that described sorlCan combine with ripe memory database, directly index is existed in memory database.

A kind of Hbase second level query scheme based on solr the most according to claim 1, it is characterised in that described solrThe operation setting up index can also be placed in the coprocessor of Hbase execution.