Movatterモバイル変換


[0]ホーム

URL:


CN111488379A - Method for optimizing Hbase large data query - Google Patents

Method for optimizing Hbase large data query
Download PDF

Info

Publication number
CN111488379A
CN111488379ACN202010305095.3ACN202010305095ACN111488379ACN 111488379 ACN111488379 ACN 111488379ACN 202010305095 ACN202010305095 ACN 202010305095ACN 111488379 ACN111488379 ACN 111488379A
Authority
CN
China
Prior art keywords
query
hbase
data
rowkey
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010305095.3A
Other languages
Chinese (zh)
Other versions
CN111488379B (en
Inventor
储明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Focus Technology Co Ltd
Original Assignee
Focus Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Focus Technology Co LtdfiledCriticalFocus Technology Co Ltd
Priority to CN202010305095.3ApriorityCriticalpatent/CN111488379B/en
Publication of CN111488379ApublicationCriticalpatent/CN111488379A/en
Application grantedgrantedCritical
Publication of CN111488379BpublicationCriticalpatent/CN111488379B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种优化Hbase大数据查询的方法,其特征在于:包括数据库存储Hbase的rowKey和更新记录,数据库存储Hbase中列的基本信息,基于Lucene建立查询索引,自定义查询视图,数据标签管理和系统化的查询页面的步骤。本发明旨在解决一般Hbase存储大量数据,难以实现高效便捷的复杂查询的问题。

Figure 202010305095

The invention discloses a method for optimizing HBase big data query, which is characterized in that: the database stores the rowKey and update records of HBase, the database stores the basic information of the columns in HBase, establishes a query index based on Lucene, a custom query view, and a data label. Steps to manage and systematize query pages. The invention aims to solve the problem that the general Hbase stores a large amount of data and it is difficult to realize efficient and convenient complex query.

Figure 202010305095

Description

Translated fromChinese
一种优化Hbase大数据查询的方法A method to optimize HBase big data query

技术领域technical field

本发明涉及大数据搜索技术领域,特别是涉及一种优化Hbase大数据查询的方法。The invention relates to the technical field of big data search, in particular to a method for optimizing HBase big data query.

背景技术Background technique

随着互联网的发展,互联网企业的数据也是呈指数级上升,利用hadoop生态体系存储数据成为了多数企业的首选,而其中被称为分布式数据库的Hbase也是其中最常用的一种存储方式。With the development of the Internet, the data of Internet companies has also increased exponentially. Using the hadoop ecosystem to store data has become the first choice for most companies, and Hbase, known as a distributed database, is also one of the most commonly used storage methods.

Hbase由于其存储原理限制,虽然对于rowKey查询能够很好的支持,但是对于复杂条件的过滤查询,效率十分低下。那么如何优化Hbase大数据查询的方法,就是不得不考虑的一个问题。Due to the limitation of its storage principle, although Hbase can support rowKey queries well, it is very inefficient for filtering queries with complex conditions. So how to optimize the HBase big data query method is a problem that has to be considered.

业界一般的解决方法就是在Hbase中新建表来实现二级索引,建立列数据和rowKey的对应关系,加快数据的定位速度。但是随着请求复杂度的提升,新建的索引表会越来越多,数据冗余越发严重,存储压力越来越大,市面上也有相应的优化方案,但是也或多或少存在一些问题。The general solution in the industry is to create a new table in HBase to implement a secondary index, establish a corresponding relationship between column data and rowKey, and speed up data positioning. However, as the request complexity increases, more and more new index tables will be created, the data redundancy will become more serious, and the storage pressure will increase. There are also corresponding optimization solutions on the market, but there are more or less problems.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题是提升Hbase复杂查询的性能,降低Hbase的存储和计算压力,提供一种优化Hbase大数据查询的方法。The technical problem to be solved by the present invention is to improve the performance of HBase complex query, reduce the storage and calculation pressure of HBase, and provide a method for optimizing HBase big data query.

为解决上述技术问题,本发明提供一种优化Hbase大数据查询的方法,其特征在于:包括数据库存储Hbase的rowKey和更新记录,在数据库存储Hbase中列的基本信息,基于Lucene建立查询索引,自定义查询视图,数据标签管理和系统化的查询页面的步骤,具体为:In order to solve the above-mentioned technical problems, the present invention provides a method for optimizing HBase big data query, which is characterized in that: including database storage of HBase rowKey and update records, storage of basic information of columns in HBase in the database, establishment of a query index based on Lucene, automatic Steps to define query views, data tag management and systematic query pages, specifically:

S1,数据库存储Hbase的rowKey和更新记录:S1, the database stores the rowKey and update records of Hbase:

在往Hbase中添加或删除数据之前,将数据rowKey和相关操作信息在数据库中存储一份,内容包括以下字段:rowKey、Hbase表名、逻辑删除标志、添加时间、更新时间,用于为建立lucene查询索引提供基础数据支撑;Before adding or deleting data to Hbase, store a copy of the data rowKey and related operation information in the database, including the following fields: rowKey, Hbase table name, logical delete flag, add time, update time, used for establishing lucene Query index provides basic data support;

当Hbase中查询条件列数据发生变更时,往数据库中回写流水记录信息,包括字段:rowKey、Hbase表名、是否删除标志以及添加时间,用于为lucene索引增量更新或增量删除提供基础数据支撑;When the query condition column data in Hbase changes, write back the flow record information to the database, including fields: rowKey, Hbase table name, whether to delete the flag, and the addition time, which is used to provide the basis for incremental update or incremental deletion of lucene index data support;

S2,在数据库存储Hbase中列的基本信息:在数据库中存一份定义关系,包括Hbase数据的表名、列簇和列名,并维护其对应的中文释义,用于查询页面展示;S2, store the basic information of the columns in HBase in the database: store a definition relationship in the database, including the table name, column cluster and column name of the HBase data, and maintain its corresponding Chinese definition for query page display;

S3,基于Lucene建立查询索引:日终应用通过S1中rowKey和更新记录,以及S2中列的基本信息,从Hbase中取出相应的数据,提交给基于lucene的搜索引擎更新索引,所述搜索引擎对外提供类sql语句方式的查询接口;S3, build a query index based on Lucene: The end-of-day application retrieves the corresponding data from Hbase through the rowKey and update records in S1, and the basic information of the column in S2, and submits it to the lucene-based search engine to update the index, and the search engine externally Provides a query interface in the form of SQL-like statements;

S4,自定义查询视图:所述自定义查询视图包括视图定义和条件配置两个部分,所述视图定义用于确认查询的字段范围,所述条件配置用于约束查询的数据范围,视图与条件为一对多关系,存储在数据库中;所述自定义查询视图采用类sql语句的方式进行配置,与所述S3中搜索引擎的查询接口一致;S4, custom query view: the custom query view includes two parts: view definition and condition configuration, the view definition is used to confirm the field range of the query, and the condition configuration is used to constrain the data range of the query, view and condition It is a one-to-many relationship, and is stored in the database; the self-defined query view is configured by a sql-like statement, which is consistent with the query interface of the search engine in the S3;

S5,数据标签管理:对查询出来的结果数据,批量打上用户自定义的标签,并将关系存储在数据库中;打上标签的数据,可以通过标签直接过滤查询出来,或者与查询条件组合查询结果;S5, data label management: label the result data from the query with user-defined labels in batches, and store the relationship in the database; the labelled data can be directly filtered and queried through the label, or the query result can be combined with the query conditions;

S6,系统化的查询页面:建立一个web服务应用,将S1、S2和S3中的数据以及S4、S5中的功能进行集成和展示,形成系统化的查询功能页面。S6, systematic query page: build a web service application, integrate and display the data in S1, S2 and S3 and the functions in S4 and S5 to form a systematic query function page.

所述S1中,数据库存储Hbase所有rowKey信息,记录rowKey的增删时间,用于日终应用查出增删的数据提交给lucene索引;记录Hbase中Lucene索引相关数据的更新流水记录,用于日终应用查出更新数据提交给lucene索引;In the described S1, the database stores all the rowKey information of HBase, records the addition and deletion time of the rowKey, and is used for the end-of-day application to find out the addition and deletion of data and submit it to the lucene index; record the update flow record of the Lucene index-related data in Hbase, which is used for the end-of-day application Find out the updated data and submit it to the lucene index;

所述S2中,在数据库中存储了Hbase相关表和列的配置信息,根据所述配置信息对所有进入Hbase的数据进行权限控制和范围校验。In the S2, the configuration information of HBase-related tables and columns is stored in the database, and authority control and range verification are performed on all data entering HBase according to the configuration information.

所述S3中,基于Lucene建立查询索引中,日终应用接收到提交的数据后,开始处理逻辑,具体包括如下步骤:In the S3, in the establishment of a query index based on Lucene, after the end-of-day application receives the submitted data, it starts processing logic, which specifically includes the following steps:

S3-1:在数据库hbase_rowkey_info表中,按规则查询对应rowkey是否存在,存在则返回rowkey;不存在则按规则生成rowkey,并添加到hbase_rowkey_info表中;S3-1: In the hbase_rowkey_info table of the database, query whether the corresponding rowkey exists according to the rules, and return the rowkey if it exists; if it does not exist, generate the rowkey according to the rules and add it to the hbase_rowkey_info table;

S3-2:针对提交的数据字段,在hbase_column_info表中进行写入权限校验,预先配置过的字段可以进行数据提交;S3-2: For the submitted data fields, perform write permission verification in the hbase_column_info table, and pre-configured fields can be submitted for data;

S3-3:将获取到的rowkey和提交数据组合后,写入到Hbase对应表中;S3-3: After combining the obtained rowkey and the submitted data, write it into the corresponding table of HBase;

S3-4:数据写入hbase成功后,如果更新数据涉及到lucene查询要的字段,则在数据库hbase_update_record表中进行流水记录,便于增量获取待更新到lucene索引中的数据;S3-4: After the data is successfully written into hbase, if the updated data involves the fields required by the lucene query, the flow record is performed in the hbase_update_record table of the database, so as to facilitate incremental acquisition of the data to be updated to the lucene index;

S3-5:日终应用定时从数据库hbase_rowkey_info表中获取新增的rowkey记录,从hbase_update_record表中获取更新或删除的rowkey记录;S3-5: The end-of-day application regularly obtains the newly added rowkey records from the hbase_rowkey_info table of the database, and obtains the updated or deleted rowkey records from the hbase_update_record table;

S3-6:从数据库hbase_column_info中获取lucene查询使用到的字段信息,区分查询字段和数据字段;S3-6: Obtain the field information used by the lucene query from the database hbase_column_info, and distinguish the query field from the data field;

S3-7:结合S3-5和S3-6获取的信息,从hbase中查询出需要提交给lucene进行索引更新的数据;将待更新数据提交给Lucene进行索引更新。S3-7: Combine the information obtained in S3-5 and S3-6, query the data from hbase that needs to be submitted to Lucene for index update; submit the data to be updated to Lucene for index update.

本发明所达到的有益效果:The beneficial effects achieved by the present invention:

(1)本发明建立了基于Lucene的查询索引,提升Hbase复杂条件查询的性能;(1) The present invention establishes a query index based on Lucene to improve the performance of HBase complex condition query;

(2)本发明避免Hbase直接进行复杂查询,减少Hbase的存储和计算压力,间接提升rowKey查询的性能;(2) The present invention avoids HBase to directly perform complex query, reduces the storage and calculation pressure of HBase, and indirectly improves the performance of rowKey query;

(3)本发明支持标签管理查询功能,大大丰富了复杂的查询场景;(3) The present invention supports the label management query function, which greatly enriches complex query scenarios;

(4)本发明提供一套功能完整的查询界面,优化用户体验。(4) The present invention provides a set of query interfaces with complete functions to optimize user experience.

附图说明Description of drawings

图1为本发明的示例性实施例中的方法流程示意图;FIG. 1 is a schematic flowchart of a method in an exemplary embodiment of the present invention;

图2为本发明的示例性实施例中的数据写入索引详细步骤示意图;FIG. 2 is a schematic diagram of detailed steps for writing data into an index in an exemplary embodiment of the present invention;

图3为本发明的示例性实施例中查询页面总览示意图;3 is a schematic diagram of an overview of a query page in an exemplary embodiment of the present invention;

图4为本发明的示例性实施例中查询页面-视图定义模块示意图;4 is a schematic diagram of a query page-view definition module in an exemplary embodiment of the present invention;

图5为本发明的示例性实施例中查询页面-字段展示模块示意图;5 is a schematic diagram of a query page-field display module in an exemplary embodiment of the present invention;

图6为本发明的示例性实施例中查询页面-标签管理模块示意图;6 is a schematic diagram of a query page-tag management module in an exemplary embodiment of the present invention;

图7为本发明的示例性实施例中查询页面-索引数据展示模块示意图;7 is a schematic diagram of a query page-index data display module in an exemplary embodiment of the present invention;

图8为本发明的示例性实施例中查询页面-Hbase数据展示模块示意图;8 is a schematic diagram of a query page-Hbase data display module in an exemplary embodiment of the present invention;

具体实施方式Detailed ways

下面结合附图和示例性实施例对本发明作进一步的说明:The present invention will be further described below in conjunction with the accompanying drawings and exemplary embodiments:

如图1所示的本发明的示例性实施例中的数据写入查询示意图,具体流程描述如下:The schematic diagram of data write query in the exemplary embodiment of the present invention as shown in FIG. 1, the specific process is described as follows:

从数据写入Hbase开始;Start from writing data to Hbase;

S1,数据库存储Hbase的rowKey和更新记录:S1, the database stores the rowKey and update records of Hbase:

在往Hbase中添加或删除数据之前,将数据rowKey和相关操作信息在数据库中存储一份,使数据量直观可控;内容包括以下字段:rowKey、Hbase表名、逻辑删除标志、添加时间、更新时间,用于为建立lucene查询索引提供基础数据支撑;Before adding or deleting data to Hbase, store a copy of the data rowKey and related operation information in the database to make the data volume intuitive and controllable; the content includes the following fields: rowKey, Hbase table name, logical delete flag, add time, update Time, used to provide basic data support for establishing lucene query index;

当Hbase中查询条件列数据发生变更时,往数据库中回写流水记录信息,包括字段:rowKey、Hbase表名、是否删除标志以及添加时间,用于为lucene索引增量更新或增量删除提供基础数据支撑;便于日终应用查出更新数据提交给lucene索引;进而保证索引查询的实时性和数据完整性;When the query condition column data in Hbase changes, write back the flow record information to the database, including fields: rowKey, Hbase table name, whether to delete the flag, and the addition time, which is used to provide the basis for incremental update or incremental deletion of lucene index Data support; it is convenient for end-of-day applications to find out the updated data and submit it to the lucene index; thereby ensuring the real-time performance and data integrity of index query;

S2,在数据库存储Hbase中列的基本信息:在数据库中存一份定义关系,包括Hbase数据的表名、列簇和列名,并维护其对应的中文释义,用于查询页面展示;在数据库中存储了Hbase相关表和列的配置信息,所有进入Hbase的数据会根据这些配置信息进行权限控制和范围校验,让Hbase中数据变得更加透明,范围可控,为优化查询方法提供基础支撑;S2, store the basic information of the columns in HBase in the database: store a definition relationship in the database, including the table name, column cluster and column name of the HBase data, and maintain its corresponding Chinese definition for query page display; in the database It stores the configuration information of HBase related tables and columns. All data entering HBase will be subject to permission control and range verification based on these configuration information, making the data in HBase more transparent and controllable in scope, providing basic support for optimizing query methods. ;

S3,基于Lucene建立查询索引:日终应用通过S1中rowKey增删记录和更新流水记录以及S2中列相关配置信息,从Hbase中取出相应的数据,提交给基于lucene的搜索引擎更新索引,简单高效,只提取查询条件相关的数据提交入索引,减少搜索引擎的压力;该搜索引擎对外提供类sql语句方式的查询接口,能够更好识别用户请求,灵活方便;S3, build a query index based on Lucene: The end-of-day application adds and deletes records and updates flow records through rowKey in S1 and column related configuration information in S2, extracts the corresponding data from Hbase, and submits it to the lucene-based search engine to update the index, which is simple and efficient. Only the data related to the query conditions are extracted and submitted to the index, which reduces the pressure on the search engine; the search engine provides a query interface in the form of sql statement, which can better identify user requests, which is flexible and convenient;

S4,自定义查询视图:所述自定义查询视图包括视图定义和条件配置两个部分,视图定义用于确认查询的字段范围,条件配置用于约束查询的数据范围,视图与条件为一对多关系,存储在数据库中;所述自定义查询视图采用类sql语句的方式进行配置,与所述S3中搜索引擎查询接口一致;提出了查询字段和查询条件分离的视图定义模式,用户可以将常用的查询字段和表信息以类sql的方式保存成视图,在此视图基础上,将常用的查询条件以类sql的方式保存成条件配置,便于用户快捷高效的进行查询请求拼接和修改;S4, custom query view: the custom query view includes two parts: view definition and condition configuration. The view definition is used to confirm the field range of the query, and the condition configuration is used to constrain the data range of the query. The view and the condition are one-to-many The relationship is stored in the database; the self-defined query view is configured in the way of sql-like statement, which is consistent with the search engine query interface in the S3; a view definition mode with separation of query fields and query conditions is proposed, and users can The query fields and table information are saved as a view in a sql-like way. On the basis of this view, the commonly used query conditions are saved as a condition configuration in a sql-like way, which is convenient for users to splicing and modifying query requests quickly and efficiently;

S5,数据标签管理:对查询出来的结果数据,批量打上用户自定义的标签,并将关系存储在数据库中;打上标签的数据,可以通过标签直接过滤查询出来,或者与查询条件组合查询结果;提出了数据标签机制,用户可以针对指定的数据批量打上自定义标签,多个标签可组合使用,标签和查询视图也可组合使用,帮助用户快速定位到需要查询的数据,灵活方便;S5, data label management: label the result data from the query with user-defined labels in batches, and store the relationship in the database; the labelled data can be directly filtered and queried through the label, or the query result can be combined with the query conditions; A data labeling mechanism is proposed. Users can label the specified data in batches with custom labels. Multiple labels can be used in combination, and labels and query views can also be used in combination to help users quickly locate the data to be queried, which is flexible and convenient;

S6,系统化查询页面:建立一个web服务应用,将S1、S2和S3的数据以及S4、S5的功能进行集成和展示,形成系统化的查询功能页面,便于用户操作。将所有的数据和功能集成到一个系统化的功能页面上,使用户能够快速连贯的完成全部的查询操作,减少页面切换次数和点击次数,提升用户体验;S6, systematic query page: build a web service application, integrate and display the data of S1, S2 and S3 and the functions of S4 and S5 to form a systematic query function page, which is convenient for users to operate. Integrate all data and functions into a systematic function page, so that users can quickly and coherently complete all query operations, reduce page switching times and click times, and improve user experience;

至此,数据从写入到查询流程结束。At this point, the data is written to the end of the query process.

图2为本发明的示例性实施例中的数据写入索引详细步骤示意图,图2在图1写入数据流程基础上进行展开,图2中箭头表示步骤顺序,具体说明如下:FIG. 2 is a schematic diagram of the detailed steps of writing data into an index in an exemplary embodiment of the present invention. FIG. 2 is expanded on the basis of the data writing process in FIG. 1 , and the arrows in FIG. 2 indicate the sequence of steps, and the specific description is as follows:

开始:应用接收到提交的数据后,开始处理逻辑;Start: After the application receives the submitted data, it starts processing logic;

P1:拿到数据后,在数据库hbase_rowkey_info表中,按规则查询对应rowkey是否存在,存在则返回rowkey;不存在则按规则生成rowkey,并添加到hbase_rowkey_info表中;P1: After getting the data, in the database hbase_rowkey_info table, query whether the corresponding rowkey exists according to the rules, and return the rowkey if it exists; if it does not exist, generate the rowkey according to the rules and add it to the hbase_rowkey_info table;

P2:针对提交的数据字段,在hbase_column_info表中进行写入权限校验,只有预先配置过的字段,才可以进行数据提交,以确保数据的可控和查询的便利性;P2: For the submitted data fields, perform write permission verification in the hbase_column_info table, and only pre-configured fields can submit data to ensure data controllability and query convenience;

P3:将获取到的rowkey和提交数据组合后,写入到Hbase对应表中;P3: After combining the obtained rowkey and the submitted data, write it into the corresponding table of HBase;

P4:数据写入hbase成功后,如果更新数据涉及到lucene查询要的字段,则在数据库hbase_update_record表中进行流水记录,便于增量获取待更新到lucene索引中的数据;P4: After the data is successfully written to hbase, if the updated data involves the fields required by the lucene query, the flow record is performed in the hbase_update_record table of the database to facilitate incremental acquisition of the data to be updated to the lucene index;

P5:应用定时从数据库hbase_rowkey_info表中获取新增的rowkey记录,从hbase_update_record表中获取更新/删除的rowkey记录;P5: The application regularly obtains the newly added rowkey record from the hbase_rowkey_info table of the database, and obtains the updated/deleted rowkey record from the hbase_update_record table;

P6:从数据库hbase_column_info中获取lucene查询使用到的字段信息,该处区分查询字段和数据字段,主要是因为存储使用的数据字段,量往往是查询条件的多倍,全部建入索引代价很大,因此摒弃数据字段,减少资源和压力;P6: Obtain the field information used by lucene query from the database hbase_column_info, where the query field and data field are distinguished, mainly because the amount of data fields used in storage is often many times the query conditions, and it is very expensive to build all indexes. Therefore, the data field is discarded, reducing resources and pressure;

P7:结合P5和P6获取的信息,从hbase中查询出需要提交给lucene进行索引更新的数据;P7: Combine the information obtained by P5 and P6, and query the data that needs to be submitted to lucene for index update from hbase;

结束:拿到待更新数据后,提交给Lucene进行索引更新;End: After getting the data to be updated, submit it to Lucene for index update;

图3为本发明的示例性实施例中查询页面总览示意图,是在图1的查询数据流程基础上形成的查询功能界面,图4~图8则是对图4的界面模块的详细示意图;3 is a schematic diagram of an overview of the query page in the exemplary embodiment of the present invention, which is a query function interface formed on the basis of the query data flow of FIG. 1 , and FIGS. 4 to 8 are detailed schematic diagrams of the interface module of FIG. 4 ;

如图4所示的视图定义模块,点击“创建视图”按钮,通过类sql的语法配置视图定义并保存,用以确认待查询的字段是哪些,如图中举例:“select c.pid,p.gender,p.telephone,p.name,p.email,p.mobile_active_flag,p.mail_flag,p.country fromcommon|c#personal|p”,其中“|”代表别名,“#”代表关联表查询;如果有预置视图定义,可直接在下拉框中勾选即可;在查询条件输入框中,输入本次查询的条件,如图中举例:“pid=1or gender=0”,由于此处是查询lucene索引,所以不需要指定别名;如果希望保存本次查询条件,点击条件输入框上方的“保存”按钮即可;如果有预置查询条件,点击条件输入框上方的“历史”按钮,即可浏览选中;点击条件输入框上方的“查询”按钮,即可在D模块中展示索引查询出来的相关数据。In the view definition module shown in Figure 4, click the "Create View" button to configure and save the view definition through the sql-like syntax to confirm which fields are to be queried. For example, as shown in the figure: "select c.pid, p .gender, p.telephone, p.name, p.email, p.mobile_active_flag, p.mail_flag, p.country fromcommon|c#personal|p", where "|" stands for alias, "#" stands for association table query; If there is a preset view definition, you can directly check it in the drop-down box; in the query condition input box, enter the conditions of this query, as shown in the figure: "pid=1or gender=0", because here is To query the lucene index, there is no need to specify an alias; if you want to save this query condition, click the "Save" button above the condition input box; if there are preset query conditions, click the "History" button above the condition input box, that is You can browse and select; click the "Query" button above the condition input box to display the relevant data queried by the index in the D module.

如图5所示的字段展示模块,待视图定义勾选完毕后,后台根据视图定义,从数据库中查询相关字段的信息,以表格形式分页展示;如果希望查看的数据字段和视图定义存在出入,可手动进行展示字段的勾选,默认是全选。In the field display module shown in Figure 5, after the view definition is checked, the background will query the relevant field information from the database according to the view definition, and display it in tabular form. You can manually check the display fields, and the default is to select all.

如图6所示的标签管理模块,下拉输入框中选择已有的数据标签,图中举例:“演示标签1”,点击下拉输入框右边的“标签过滤”按钮,会结合A模块的搜索条件,触发lucene索引的二次过滤查询,结果在索引数据展示模块(图8)中进行展示;下拉输入框中选择已有的的数据标签或者是输入自定义的标签名称,在索引数据展示模块(图8)中勾选需要打上标签的Hbase的rowKey记录,点击标签管理模块(图7)下拉输入框右边的“保存标签”按钮,即可保存数据标签和Hbase的rowKey的对应关系;其中,如果数据标签不存在,会自动进行创建并保存到数据库。For the label management module shown in Figure 6, select the existing data label in the drop-down input box, for example: "Demo Label 1", click the "Label Filter" button on the right side of the drop-down input box, it will combine the search conditions of module A , trigger the secondary filtering query of the lucene index, and the results are displayed in the index data display module (Figure 8); select an existing data label in the drop-down input box or enter a custom label name, and display it in the index data display module (Figure 8). In Figure 8), check the rowKey record of Hbase that needs to be labeled, and click the "Save Label" button on the right side of the drop-down input box of the label management module (Figure 7) to save the corresponding relationship between the data label and the rowKey of Hbase; The data tag does not exist, it will be automatically created and saved to the database.

如图7所示的索引数据展示模块,触发lucene索引查询后,Hbase的rowKey(UID)、查询条件相关的字段和数据标签会以表格的形式分页进行展示,其中rowKey上附有超链接,点击后可触发Hbase的查询,结果会在E模块中展示。The index data display module shown in Figure 7, after triggering the lucene index query, the rowKey (UID) of Hbase, the fields and data tags related to the query conditions will be displayed in the form of a table page, and the rowKey is attached with a hyperlink, click After that, the query of HBase can be triggered, and the result will be displayed in the E module.

如图8所示的Hbase数据展示模块,在索引数据展示模块(图8)中点击rowKey的超链接之后,后台会查询Hbase,然后会以表格的形式进行分页展示;至此,查询流程结束。In the Hbase data display module shown in Figure 8, after clicking the hyperlink of the rowKey in the index data display module (Figure 8), the background will query HBase, and then display it in the form of a page; at this point, the query process ends.

本发明主要用于提供一种优化Hbase大数据查询的方法及系统,其有益效果是:The present invention is mainly used to provide a method and system for optimizing HBase big data query, and its beneficial effects are:

(1)本发明建立了基于Lucene的查询索引,提升Hbase复杂条件查询的性能;(1) The present invention establishes a query index based on Lucene to improve the performance of HBase complex condition query;

(2)本发明避免Hbase直接进行复杂查询,减少Hbase的存储和计算压力,间接提升rowKey查询的性能;(2) The present invention avoids HBase to directly perform complex query, reduces the storage and calculation pressure of HBase, and indirectly improves the performance of rowKey query;

(3)本发明支持标签管理查询功能,大大丰富了复杂的查询场景;(3) The present invention supports the label management query function, which greatly enriches complex query scenarios;

(4)本发明提供一套功能完整的系统化查询界面,精简查询操作流程,优化用户体验。(4) The present invention provides a set of systematic query interface with complete functions, which simplifies the query operation process and optimizes the user experience.

以上实施例不以任何方式限定本发明,凡是对以上实施例以等效变换方式做出的其它改进与应用,都属于本发明的保护范围。The above embodiments do not limit the present invention in any way, and all other improvements and applications made in the form of equivalent transformations to the above embodiments belong to the protection scope of the present invention.

Claims (4)

Translated fromChinese
1.一种优化Hbase大数据查询的方法,其特征在于:包括数据库存储Hbase的rowKey和更新记录,在数据库存储Hbase中列的基本信息,基于Lucene建立查询索引,自定义查询视图,数据标签管理和系统化的查询页面的步骤,具体为:1. a method for optimizing HBase big data query, it is characterized in that: comprise the rowKey and the update record of database storage HBase, the basic information of column in database storage HBase, build query index based on Lucene, self-defined query view, data label management and systematic steps to query the page, specifically:S1,数据库存储Hbase的rowKey和更新记录:S1, the database stores the rowKey and update records of Hbase:在往Hbase中添加或删除数据之前,将数据rowKey和相关操作信息在数据库中存储一份,内容包括以下字段:rowKey、Hbase表名、逻辑删除标志、添加时间、更新时间,用于为建立lucene查询索引提供基础数据支撑;Before adding or deleting data to Hbase, store a copy of the data rowKey and related operation information in the database, including the following fields: rowKey, Hbase table name, logical delete flag, add time, update time, used for establishing lucene Query index provides basic data support;当Hbase中查询条件列数据发生变更时,往数据库中回写流水记录信息,包括字段:rowKey、Hbase表名、是否删除标志以及添加时间,用于为lucene索引增量更新或增量删除提供基础数据支撑;When the query condition column data in Hbase changes, write back the flow record information to the database, including fields: rowKey, Hbase table name, whether to delete the flag, and the addition time, which is used to provide the basis for incremental update or incremental deletion of lucene index data support;S2,在数据库存储Hbase中列的基本信息:在数据库中存一份定义关系,包括Hbase数据的表名、列簇和列名,并维护其对应的中文释义,用于查询页面展示;S2, store the basic information of the columns in HBase in the database: store a definition relationship in the database, including the table name, column cluster and column name of the HBase data, and maintain its corresponding Chinese definition for query page display;S3,基于Lucene建立查询索引:日终应用通过S1中rowKey和更新记录,以及S2中列的基本信息,从Hbase中取出相应的数据,提交给基于lucene的搜索引擎更新索引,所述搜索引擎对外提供类sql语句方式的查询接口;S3, build a query index based on Lucene: The end-of-day application retrieves the corresponding data from Hbase through the rowKey and update records in S1, and the basic information of the column in S2, and submits it to the lucene-based search engine to update the index, and the search engine externally Provides a query interface in the form of SQL-like statements;S4,自定义查询视图:所述自定义查询视图包括视图定义和条件配置两个部分,所述视图定义用于确认查询的字段范围,所述条件配置用于约束查询的数据范围,视图与条件为一对多关系,存储在数据库中;所述自定义查询视图采用类sql语句的方式进行配置,与所述S3中搜索引擎的查询接口一致;S4, custom query view: the custom query view includes two parts: view definition and condition configuration, the view definition is used to confirm the field range of the query, and the condition configuration is used to constrain the data range of the query, view and condition It is a one-to-many relationship, and is stored in the database; the self-defined query view is configured by a sql-like statement, which is consistent with the query interface of the search engine in the S3;S5,数据标签管理:对查询出来的结果数据,批量打上用户自定义的标签,并将关系存储在数据库中;打上标签的数据,可以通过标签直接过滤查询出来,或者与查询条件组合查询结果;S5, data label management: label the result data from the query with user-defined labels in batches, and store the relationship in the database; the labelled data can be directly filtered and queried through the label, or the query result can be combined with the query conditions;S6,系统化的查询页面:建立一个web服务应用,将S1、S2和S3中的数据以及S4、S5中的功能进行集成和展示,形成系统化的查询功能页面。S6, systematic query page: build a web service application, integrate and display the data in S1, S2 and S3 and the functions in S4 and S5 to form a systematic query function page.2.如权利要求1所述的一种优化Hbase大数据查询的方法,其特征在于:所述S1中,数据库存储Hbase所有rowKey信息,记录rowKey的增删时间,用于日终应用查出增删的数据提交给lucene索引;记录Hbase中Lucene索引相关数据的更新流水记录,用于日终应用查出更新数据提交给lucene索引。2. a kind of method of optimizing HBase big data query as claimed in claim 1, it is characterized in that: in described S1, database stores all rowKey information of HBase, records the addition and deletion time of rowKey, is used for day-end application to find out additions and deletions The data is submitted to the lucene index; the update flow record of the data related to the Lucene index in Hbase is recorded, which is used for the end-of-day application to find out the updated data and submit it to the lucene index.3.如权利要求2所述的一种优化Hbase大数据查询的方法,其特征在于:所述S2中,在数据库中存储了Hbase相关表和列的配置信息,根据所述配置信息对所有进入Hbase的数据进行权限控制和范围校验。3. a kind of method of optimizing HBase big data query as claimed in claim 2 is characterized in that: in described S2, the configuration information of HBase related table and column is stored in the database, according to the described configuration information, all the entering Hbase data is subject to permission control and range verification.4.如权利要求3所述的一种优化Hbase大数据查询的方法,其特征在于:所述S3中,基于Lucene建立查询索引中,日终应用接收到提交的数据后,开始处理逻辑,具体包括如下步骤:4. a kind of method for optimizing HBase big data query as claimed in claim 3, it is characterized in that: in described S3, in establishing query index based on Lucene, after day-end application receives the data submitted, starts processing logic, concrete It includes the following steps:S3-1:在数据库hbase_rowkey_info表中,按规则查询对应rowkey是否存在,存在则返回rowkey;不存在则按规则生成rowkey,并添加到hbase_rowkey_info表中;S3-1: In the hbase_rowkey_info table of the database, query whether the corresponding rowkey exists according to the rules, and return the rowkey if it exists; if it does not exist, generate the rowkey according to the rules and add it to the hbase_rowkey_info table;S3-2:针对提交的数据字段,在hbase_column_info表中进行写入权限校验,预先配置过的字段可以进行数据提交;S3-2: For the submitted data fields, perform write permission verification in the hbase_column_info table, and the pre-configured fields can be submitted for data;S3-3:将获取到的rowkey和提交数据组合后,写入到Hbase对应表中;S3-3: After combining the obtained rowkey and the submitted data, write it into the corresponding table of HBase;S3-4:数据写入hbase成功后,如果更新数据涉及到lucene查询要的字段,则在数据库hbase_update_record表中进行流水记录,便于增量获取待更新到lucene索引中的数据;S3-4: After the data is successfully written to hbase, if the updated data involves the fields required by the lucene query, the flow record is performed in the hbase_update_record table of the database, so as to facilitate incremental acquisition of the data to be updated to the lucene index;S3-5:日终应用定时从数据库hbase_rowkey_info表中获取新增的rowkey记录,从hbase_update_record表中获取更新或删除的rowkey记录;S3-5: The end-of-day application regularly obtains the newly added rowkey record from the hbase_rowkey_info table of the database, and obtains the updated or deleted rowkey record from the hbase_update_record table;S3-6:从数据库hbase_column_info中获取lucene查询使用到的字段信息,区分查询字段和数据字段;S3-6: Obtain the field information used by the lucene query from the database hbase_column_info, and distinguish the query field from the data field;S3-7:结合S3-5和S3-6获取的信息,从hbase中查询出需要提交给lucene进行索引更新的数据;将待更新数据提交给Lucene进行索引更新。S3-7: Combine the information obtained in S3-5 and S3-6, query the data from hbase that needs to be submitted to lucene for index update; submit the to-be-updated data to Lucene for index update.
CN202010305095.3A2020-04-172020-04-17Method for optimizing Hbase large data queryActiveCN111488379B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010305095.3ACN111488379B (en)2020-04-172020-04-17Method for optimizing Hbase large data query

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010305095.3ACN111488379B (en)2020-04-172020-04-17Method for optimizing Hbase large data query

Publications (2)

Publication NumberPublication Date
CN111488379Atrue CN111488379A (en)2020-08-04
CN111488379B CN111488379B (en)2022-07-19

Family

ID=71812831

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010305095.3AActiveCN111488379B (en)2020-04-172020-04-17Method for optimizing Hbase large data query

Country Status (1)

CountryLink
CN (1)CN111488379B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112000849A (en)*2020-08-212020-11-27河南中原消费金融股份有限公司Unified label library management method, device, equipment and storage medium
CN115098568A (en)*2022-07-182022-09-23中国工商银行股份有限公司Data processing method, apparatus, device, medium, and program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104636389A (en)*2013-11-142015-05-20博雅网络游戏开发(深圳)有限公司Hbase database real-time query achieving method and system
CN106227788A (en)*2016-07-202016-12-14浪潮软件集团有限公司Database query method based on Lucene
CN106326429A (en)*2016-08-252017-01-11武汉光谷信息技术股份有限公司Hbase second-level query scheme based on solr

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104636389A (en)*2013-11-142015-05-20博雅网络游戏开发(深圳)有限公司Hbase database real-time query achieving method and system
CN106227788A (en)*2016-07-202016-12-14浪潮软件集团有限公司Database query method based on Lucene
CN106326429A (en)*2016-08-252017-01-11武汉光谷信息技术股份有限公司Hbase second-level query scheme based on solr

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112000849A (en)*2020-08-212020-11-27河南中原消费金融股份有限公司Unified label library management method, device, equipment and storage medium
CN112000849B (en)*2020-08-212024-08-09河南中原消费金融股份有限公司Unified tag library management method, device, equipment and storage medium
CN115098568A (en)*2022-07-182022-09-23中国工商银行股份有限公司Data processing method, apparatus, device, medium, and program product

Also Published As

Publication numberPublication date
CN111488379B (en)2022-07-19

Similar Documents

PublicationPublication DateTitle
AU2020260374B2 (en)Building reports
US11030242B1 (en)Indexing and querying semi-structured documents using a key-value store
US10055113B2 (en)System and method for modifying user interface elements
US8914414B2 (en)Integrated repository of structured and unstructured data
AU2017367772B2 (en)Generating, accessing, and displaying lineage metadata
US8442982B2 (en)Extended database search
US9514187B2 (en)Techniques for using zone map information for post index access pruning
KR101976220B1 (en)Recommending data enrichments
EP2874077B1 (en)Stateless database cache
US20110283242A1 (en)Report or application screen searching
US20130198238A1 (en)System And Method For Processing Data In Diverse Storage Systems
CN108108392B (en) Commodity data management method, device, computer equipment and storage medium
US9990407B2 (en)Stand-alone database browsing system and method
CN116501938A (en) Data acquisition method, device, equipment and storage medium
CN111488379A (en)Method for optimizing Hbase large data query
CN116578620A (en) Python-based table data import method, device, equipment and storage medium
US10769164B2 (en)Simplified access for core business with enterprise search
CN114116723B (en) Snapshot processing method, device and electronic equipment
CN111125262B (en)Method and device for processing field information, storage medium and processor
US12189618B2 (en)Systems and methods for querying multiple databases
US20240220485A1 (en)Systems and methods for paginating search results retrieved from databases that support cursor-based pagination
US20240220559A1 (en)Systems and methods for paginating search results retrieved from multiple databases
US12045283B2 (en)Ingestion system for distributed graph database
US20180004808A1 (en)Meta-facets for semantically-related dimensions
CN117724776A (en)Method and device for quickly generating universal data management application

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp