Movatterモバイル変換


[0]ホーム

URL:


CN106326429A - Hbase second-level query scheme based on solr - Google Patents

Hbase second-level query scheme based on solr
Download PDF

Info

Publication number
CN106326429A
CN106326429ACN201610723701.7ACN201610723701ACN106326429ACN 106326429 ACN106326429 ACN 106326429ACN 201610723701 ACN201610723701 ACN 201610723701ACN 106326429 ACN106326429 ACN 106326429A
Authority
CN
China
Prior art keywords
solr
hbase
index
data
rowkey
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610723701.7A
Other languages
Chinese (zh)
Inventor
童浩
杨凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Optics Valley Information Technologies Co Ltd
Original Assignee
Wuhan Optics Valley Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Optics Valley Information Technologies Co LtdfiledCriticalWuhan Optics Valley Information Technologies Co Ltd
Priority to CN201610723701.7ApriorityCriticalpatent/CN106326429A/en
Publication of CN106326429ApublicationCriticalpatent/CN106326429A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The invention discloses an Hbase second-level query scheme based on solr. The Hbase second-level query scheme comprises the following steps of inserting raw data into an Hbase column-oriented database; calling a MapReduce increment to update an index in the solr, obtaining the raw data, and storing into a server of the solr with a particular file format of the solr; accessing the server of the solr, and establishing the index; firstly, searching the index, obtaining rowkey from the index, and querying required result data from an Hbase main list. The Hbase second-level query scheme has the advantages that the searching speed is high, and the accuracy is high; by adopting a solr and Hbase combining technique, the massive data can be searched in a second-level way, and the rowkey of data of one page can be returned back by a page separating function of the solr; because the number of data of each page is extremely limited, the response speed is higher when the Hbase query is performed according to the rowkey of the corresponding page, and is controlled to the millisecond level.

Description

A kind of Hbase second level query scheme based on solr
Technical field
The present invention relates to hbase technical field, particularly relate to a kind of Hbase second level query scheme based on solr.
Background technology
Solr is a complete search service based on lucene under apache.Solr mainly includes two parts coreAssembly: indexing component and searching component.Indexing component is for setting up index by the data needing index in search utility, and searchesRope assembly carrys out search index for the request of customer in response end.Solr is a high-performance, uses Java5 exploitation, based onThe full-text search server of Lucene.It is extended, it is provided that the ratio query language of Lucene more horn of plenty simultaneously, withTime achieve configurable, expansible and query performance be optimized, and provide a perfect function management interface,It it is the most outstanding a full-text search engine.Document utilizes XML to be added in a search set by Http.Inquire about this setAlso it is to receive an XML/JSON response by http to realize.Its key property includes: efficiently, caching function flexibly,Vertical search function, is highlighted Search Results, improves availability by index copy, it is provided that a set of powerful DataSchema defines field, type and arrange text analyzing, it is provided that Web-based enterprise management interface etc..
Hbase is the Hadoop family distributed storage scheme for mass data, when us by rowkey to being stored inThe response of second level can be reached, it is achieved more satisfactory Consumer's Experience when mass data in Hbase is inquired about.But, whenUnder more complicated scene, if desired for when data are done multi-condition inquiry, the solution that Hbase provides is not the most to manage very muchThink.
For multi-condition inquiry, there are two kinds of solutions comparing main flow Hbase present stage itself:
1, table is manually indexed by coprocessor when inserting data
Coprocessor in Hbase has two kinds: Observer and Endpoint.Observer is similar to relevant databaseIn trigger, Endpoint is similar to the storing process in relevant database.
We use Observer when utilizing coprocessor to index table, are i.e. inserting data in Hbase tableTime, add Observer operation, allow and before often inserting a data, all call our self-defining service logic life in concordance listBecome to need the record of index field.
So when we carry out multi-condition inquiry for Hbase, our inquiry operation is divided into two steps: the first step is firstInquiring about at concordance list according to querying condition, the rowkey of the corresponding result of inquiry, second step goes master meter to look into further according to rowkeyAsk the data that we need.
This scheme has several bigger problem:
(1) coprocessor is the most unstable
In existing version Hbase, when our oneself test generates index by coprocessor, once setting up Index processMiddle code throw exception, whole Hadoop cluster all can be hung.
(2) index can affect insert data speed
Owing to inserting data and to index be a Tong Bus process, so shadow to a great extent is understood in the operation indexedRing the speed inserting data.
(3) field needing index must determine before data are inserted, and the later stage can not revise
Inserting another problem of simultaneously indexing of data is exactly that we must disposably determine and be there is a need to set up ropeThe field drawn, if the later stage need in a new field set up index, before already inserted into data be will not the most againSet up index.
(4) the corresponding concordance list of each index field is inefficient
In order to flexible when the later stage makes index of reference, typically one can be set up for each single field when setting up concordance listConcordance list.Using field value as the rowkey of concordance list, using the rowkey of former table as the field of concordance list.This modeAlthough us can be facilitated to do multi-condition inquiry flexibly, but the quantity of concordance list can be increased, looking into when word enquiring simultaneously simultaneouslyWhen inquiry condition is more, needs the concordance list inquiry operation carried out repeatedly, the response inquired about also is had and compares large effect.
2, the filter using Hbase to carry filters in service end
Hbase carries number of types of filter, and we can also oneself filter self-defined simultaneously.When we are looking intoUsing filter when of inquiry, the result data of inquiry can be carried out by the logic of filter by Hbase in the service end of clusterFilter.
But same, this scheme also has a problem in that filter still needs scan data, and efficiency is low.
Although filter is to filter in service end, but still need all numbers meeting rowkey querying conditionAccording to all checking out, it is scanned in these data the most again, filters out the data not meeting filtercondition.This processCan take a lot of service end internal memory when original query data volume is bigger, sweep time also can be the longest simultaneously, this mistake of lightThe time-consuming requirement that the most can not reach the inquiry of second level of journey.
There is some characteristic can not meet our demand based on both the above scheme, we have proposed a kind of based on solrHbase second level query scheme.
Summary of the invention
The invention aims to solve shortcoming present in prior art, and propose a kind of based on solrHbase second level query scheme.
A kind of Hbase second level query scheme based on solr, comprises the following steps:
Step 1, initial data is inserted in Hbase columnar database, keep the original mode of Hbase, be not required to do otherWhat change;
Step 2, obtain initial data and initial data is stored in the distinctive document format of solr the service end of solr,After setting up document, document can be analyzed by solr automatically, after completing analysis, solr using the word that is syncopated as key, withDocument carries out inverted index as value, i.e. forms index, and the rope set up in MapReduce incremental update solr is called in timingDraw;
When step 3, inquiry, access solr service end, need individually to set up in the field inquired about index, search index,From index, obtain rowkey, go Hbase columnar database is inquired about further according to rowkey, i.e. generate required number of resultsAccording to.
Preferably, after described solr sets up index, index compression can be stored in the disk of solr service end, simultaneouslyMap can be utilized to do the caching of part.
Preferably, segmenter can be optimized, for business scenario to being customized of participle by described solr indexOptimization, extract the special word of industry.
Preferably, described solr carries two-page separation function, can return the rowkey of page of data every time.
Preferably, described sorl can combine with ripe memory database, is directly existed in memory database by index.
Preferably, described solr sets up the operation indexed and can also be placed in the coprocessor of Hbase execution.
A kind of based on solr Hbase second level query scheme that the present invention proposes, search speed is fast, and accuracy rate is high, passes throughThe technology that solr and hbase combines, it is achieved retrieving the second level of mass data, the two-page separation function that solr carries can be returned every timeReturn the rowkey of page of data, owing to the quantity of every page data is extremely limited, so rowkey based on this page goes Hbase to look into againDuring inquiry, response speed is very fast, can be controlled in Millisecond.
Accompanying drawing explanation
Fig. 1 is data Stored Procedure figures;
Fig. 2 is data query flow chart.
Detailed description of the invention
Below in conjunction with specific embodiment, the present invention is explained orally further.
With reference to Fig. 1-2, a kind of based on solr Hbase second level query scheme that the present invention proposes, comprise the following steps:
Step 1, initial data is inserted in Hbase columnar database, keep the original mode of Hbase, be not required to do otherWhat change;
Step 2, timing are called in MapReduce incremental update solr and are indexed, and first obtain and insert in Hbase columnar databaseInitial data and initial data is stored in the server of solr with the distinctive document format of solr, set up solr after documentAutomatically document can be analyzed, relate among these by specific participle technique, the content in document is carried out participle, complete pointAfter word, solr, using the word that is syncopated as key, carries out inverted index using document as value;
When step 3, inquiry, access solr service end, the field needing inquiry is individually set up index, set up indexAfter, index compression can be stored in the disk of solr service end by solr, Map can be utilized simultaneously to do the caching of part, inquire about ropeDraw, from index, obtain rowkey, solr carry two-page separation function, the rowkey of page of data can be returned every time, further according toRowkey goes to inquire about in Hbase columnar database, i.e. generates required result data.
In the present invention solr set up index operation can also be placed in the coprocessor of Hbase execution, sorl can with becomeRipe memory database combines, and is directly existed in memory database by index.
A kind of based on solr Hbase second level query scheme that the present invention proposes, search speed is fast, and accuracy rate is high, passes throughThe technology that solr and hbase combines, it is achieved retrieving the second level of mass data, the two-page separation function that solr carries can be returned every timeReturn the rowkey of page of data, owing to the quantity of every page data is extremely limited, so rowkey based on this page goes Hbase to look into againDuring inquiry, response speed is very fast, can be controlled in Millisecond.
The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto,Any those familiar with the art in the technical scope that the invention discloses, according to technical scheme andInventive concept equivalent or change in addition, all should contain within protection scope of the present invention.

Claims (6)

CN201610723701.7A2016-08-252016-08-25Hbase second-level query scheme based on solrPendingCN106326429A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201610723701.7ACN106326429A (en)2016-08-252016-08-25Hbase second-level query scheme based on solr

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201610723701.7ACN106326429A (en)2016-08-252016-08-25Hbase second-level query scheme based on solr

Publications (1)

Publication NumberPublication Date
CN106326429Atrue CN106326429A (en)2017-01-11

Family

ID=57791438

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201610723701.7APendingCN106326429A (en)2016-08-252016-08-25Hbase second-level query scheme based on solr

Country Status (1)

CountryLink
CN (1)CN106326429A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106909671A (en)*2017-02-282017-06-30湖南蚁坊软件股份有限公司A kind of method and system of NoSQL databases condition query
CN107239517A (en)*2017-05-232017-10-10中国联合网络通信集团有限公司Many condition searching method and device based on Hbase databases
CN107656985A (en)*2017-09-112018-02-02北京京东尚科信息技术有限公司Web page interrogation method and its system
CN108573063A (en)*2018-04-272018-09-25宁波银行股份有限公司A kind of data query method and system
WO2018209574A1 (en)*2017-05-162018-11-22深圳中兴力维技术有限公司Alarm data query method and apparatus
CN109144995A (en)*2017-06-262019-01-04辽宁艾特斯智能交通技术有限公司A kind of highway magnanimity transaction data search method
CN109299143A (en)*2018-11-282019-02-01重庆邮电大学 A Quick Knowledge Indexing Method for Data Interoperability Testing Knowledge Base Based on Redis Cache
CN109471893A (en)*2018-10-242019-03-15上海连尚网络科技有限公司Querying method, equipment and the computer readable storage medium of network data
CN109697200A (en)*2018-12-182019-04-30厦门商集网络科技有限责任公司A kind of HBase secondary index method and apparatus based on Solr
CN110109870A (en)*2018-01-242019-08-09江苏友上科技实业有限公司A kind of mass data quick retrieval system based on Solr
CN110232106A (en)*2019-04-262019-09-13安徽四创电子股份有限公司A kind of mass data storage and method for quickly retrieving based on MongoDB and Solr
CN110347722A (en)*2019-07-112019-10-18软通智慧科技有限公司Data acquisition method, device, equipment and storage medium based on HBase
CN111078731A (en)*2019-11-252020-04-28国网冀北电力有限公司Hbase-based power grid operation data collaborative query method and device and storage medium
CN111488379A (en)*2020-04-172020-08-04焦点科技股份有限公司Method for optimizing Hbase large data query
CN112463832A (en)*2020-11-272021-03-09苏州浪潮智能科技有限公司Inquiry method and device based on hbase-indexer and electronic equipment
CN112506915A (en)*2020-10-272021-03-16百果园技术(新加坡)有限公司Application data management system, processing method and device and server
CN113297273A (en)*2021-06-092021-08-24北京百度网讯科技有限公司Method and device for querying metadata and electronic equipment
CN113407785A (en)*2021-06-112021-09-17西北工业大学Data processing method and system based on distributed storage system

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102426609A (en)*2011-12-282012-04-25厦门市美亚柏科信息股份有限公司Index generation method and index generation device based on MapReduce programming architecture
KR20140012377A (en)*2012-07-202014-02-03유넷시스템주식회사Method of forming index file, method of searching data and system for managing data using dictionary index file, recoding medium
CN104102710A (en)*2014-07-152014-10-15浪潮(北京)电子信息产业有限公司Massive data query method
CN104834688A (en)*2015-04-202015-08-12北京奇艺世纪科技有限公司Secondary index establishment method and device
CN105138592A (en)*2015-07-312015-12-09武汉虹信技术服务有限责任公司Distributed framework-based log data storing and retrieving method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102426609A (en)*2011-12-282012-04-25厦门市美亚柏科信息股份有限公司Index generation method and index generation device based on MapReduce programming architecture
KR20140012377A (en)*2012-07-202014-02-03유넷시스템주식회사Method of forming index file, method of searching data and system for managing data using dictionary index file, recoding medium
CN104102710A (en)*2014-07-152014-10-15浪潮(北京)电子信息产业有限公司Massive data query method
CN104834688A (en)*2015-04-202015-08-12北京奇艺世纪科技有限公司Secondary index establishment method and device
CN105138592A (en)*2015-07-312015-12-09武汉虹信技术服务有限责任公司Distributed framework-based log data storing and retrieving method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
施磊磊: "基于Hadoop 和HBase 的分布式索引模型的研究", 《信息技术》*
魏勇等: "基于GeoNames和Solr的地名数据全文检索", 《测绘工程》*

Cited By (24)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106909671A (en)*2017-02-282017-06-30湖南蚁坊软件股份有限公司A kind of method and system of NoSQL databases condition query
WO2018209574A1 (en)*2017-05-162018-11-22深圳中兴力维技术有限公司Alarm data query method and apparatus
CN107239517A (en)*2017-05-232017-10-10中国联合网络通信集团有限公司Many condition searching method and device based on Hbase databases
CN107239517B (en)*2017-05-232020-09-29中国联合网络通信集团有限公司 Multi-condition search method and device based on Hbase database
CN109144995A (en)*2017-06-262019-01-04辽宁艾特斯智能交通技术有限公司A kind of highway magnanimity transaction data search method
CN107656985A (en)*2017-09-112018-02-02北京京东尚科信息技术有限公司Web page interrogation method and its system
CN110109870A (en)*2018-01-242019-08-09江苏友上科技实业有限公司A kind of mass data quick retrieval system based on Solr
CN108573063A (en)*2018-04-272018-09-25宁波银行股份有限公司A kind of data query method and system
CN109471893A (en)*2018-10-242019-03-15上海连尚网络科技有限公司Querying method, equipment and the computer readable storage medium of network data
CN109299143A (en)*2018-11-282019-02-01重庆邮电大学 A Quick Knowledge Indexing Method for Data Interoperability Testing Knowledge Base Based on Redis Cache
CN109299143B (en)*2018-11-282022-03-22重庆邮电大学Knowledge fast indexing method of data interoperation test knowledge base based on Redis cache
CN109697200A (en)*2018-12-182019-04-30厦门商集网络科技有限责任公司A kind of HBase secondary index method and apparatus based on Solr
CN110232106A (en)*2019-04-262019-09-13安徽四创电子股份有限公司A kind of mass data storage and method for quickly retrieving based on MongoDB and Solr
CN110347722A (en)*2019-07-112019-10-18软通智慧科技有限公司Data acquisition method, device, equipment and storage medium based on HBase
CN111078731A (en)*2019-11-252020-04-28国网冀北电力有限公司Hbase-based power grid operation data collaborative query method and device and storage medium
CN111488379A (en)*2020-04-172020-08-04焦点科技股份有限公司Method for optimizing Hbase large data query
CN111488379B (en)*2020-04-172022-07-19焦点科技股份有限公司Method for optimizing Hbase large data query
CN112506915A (en)*2020-10-272021-03-16百果园技术(新加坡)有限公司Application data management system, processing method and device and server
CN112506915B (en)*2020-10-272024-05-10百果园技术(新加坡)有限公司Application data management system, processing method and device and server
CN112463832A (en)*2020-11-272021-03-09苏州浪潮智能科技有限公司Inquiry method and device based on hbase-indexer and electronic equipment
CN112463832B (en)*2020-11-272022-10-25苏州浪潮智能科技有限公司 A query method, device and electronic device based on hbase-indexer
CN113297273A (en)*2021-06-092021-08-24北京百度网讯科技有限公司Method and device for querying metadata and electronic equipment
CN113297273B (en)*2021-06-092024-03-01北京百度网讯科技有限公司Method and device for inquiring metadata and electronic equipment
CN113407785A (en)*2021-06-112021-09-17西北工业大学Data processing method and system based on distributed storage system

Similar Documents

PublicationPublication DateTitle
CN106326429A (en)Hbase second-level query scheme based on solr
US11068439B2 (en)Unsupervised method for enriching RDF data sources from denormalized data
US11573941B2 (en)Systems, methods, and data structures for high-speed searching or filtering of large datasets
US9753960B1 (en)System, method, and computer program for dynamically generating a visual representation of a subset of a graph for display, based on search criteria
US10789231B2 (en)Spatial indexing for distributed storage using local indexes
CN102426589B (en)Interlayer system used for searching database information and information searching method
US9697250B1 (en)Systems and methods for high-speed searching and filtering of large datasets
CN111506621B (en)Data statistical method and device
CN110032604A (en)Data storage device, transfer device and data bank access method
CN107203640B (en)Method and system for establishing physical model through database operation record
CN109669925B (en)Management method and device of unstructured data
CN106294695A (en)A kind of implementation method towards the biggest data search engine
US20150213066A1 (en)System and method for creating data models from complex raw log files
US20130191328A1 (en)Standardized framework for reporting archived legacy system data
CN105912609A (en)Data file processing method and device
CN107194007A (en)A kind of integrated management system of spacecraft isomery test data
KR20200094074A (en)Method, apparatus, device and storage medium for managing index
CN111680043A (en)Method for rapidly searching mass data
CN115658680A (en)Data storage method, data query method and related device
US8290950B2 (en)Identifying locale-specific data based on a total ordering of supported locales
CN107291938A (en)Order Query System and method
CN110109870A (en)A kind of mass data quick retrieval system based on Solr
CN105184550B (en)Manage method, server and the system of waiting data
US11250002B2 (en)Result set output criteria
CN112835932B (en)Batch processing method and device for business table and nonvolatile storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20170111

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp