CN113641796A

Movatterモバイル変換

Info

Publication number: CN113641796A
Application number: CN202111007463.7A
Authority: CN
Inventors: 姚小丰
Original assignee: Ping An Medical and Healthcare Management Co Ltd
Current assignee: Shenzhen Ping An Medical Health Technology Service Co Ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2021-11-12

Abstract

The application relates to the technical field of artificial intelligence, and provides a data search method, a system and a storage medium, wherein the method comprises the following steps: the method comprises the steps that a transfer server obtains a search request of a user, determines the operation intention of the user according to the search request, inquires a distribution server required to be used by the search request according to the operation intention, extracts search data from the search request, sends the search data to a distribution server corresponding to the operation intention, the distribution server divides the search data to obtain sub-search data, and distributes each sub-search data to a preset query engine, and each preset query engine obtains target data corresponding to the sub-search data from a preset database; the distribution server generates a query result after merging the target data, the transfer server generates a target query result based on the query result, and the target query result is sent to the user. According to the method and the system, the corresponding distribution server is selected based on different operation intentions, so that a large amount of invalid searches are avoided, and the data query efficiency is improved.

Description

Translated fromChinese

数据搜索方法、系统及存储介质Data search method, system and storage medium

技术领域technical field

本申请涉及人工智能技术领域，具体而言，本申请涉及一种数据搜索方法、系统及存储介质。The present application relates to the technical field of artificial intelligence, and in particular, the present application relates to a data search method, system and storage medium.

背景技术Background technique

在信息技术日益发达，企业生产数据日益庞大，知识爆炸性增长的今天，如何快速有效检索企业内部的运营数据，提升知识管理的效果，是每个企业不得不面对的一个现实问题。In today's increasingly developed information technology, increasingly large enterprise production data, and explosive growth of knowledge, how to quickly and effectively retrieve the internal operation data of the enterprise and improve the effect of knowledge management is a practical problem that every enterprise has to face.

目前常见的数据搜索技术为采用关系型数据库进行数据存储，对系统内关键数据检索通常是基于数据库部分字段的精确匹配及部分字段的模糊搜索，在数据量比较大时，这种搜索方式的搜索性能较差，搜索效率较低。At present, the common data search technology is to use relational database for data storage. The key data retrieval in the system is usually based on the exact match of some fields in the database and the fuzzy search of some fields. Poor performance and low search efficiency.

发明内容SUMMARY OF THE INVENTION

本申请的主要目的为提供一种数据搜索方法、系统及存储介质，以提高数据搜索效率。The main purpose of this application is to provide a data search method, system and storage medium to improve data search efficiency.

为了实现上述发明目的，本申请提供一种数据搜索方法，应用于数据搜索系统，所述数据搜索系统包括中转服务器、分发服务器及预设查询引擎，所述数据搜索方法包括：In order to achieve the above purpose of the invention, the present application provides a data search method, which is applied to a data search system. The data search system includes a relay server, a distribution server and a preset query engine. The data search method includes:

所述中转服务器获取用户的搜索请求，根据所述搜索请求确定所述用户的操作意向，根据所述操作意向查询所述搜索请求所需使用的分发服务器，并从所述搜索请求提取搜索数据，将所述搜索数据发送至所述操作意向对应的分发服务器；The relay server obtains the user's search request, determines the user's operation intention according to the search request, queries the distribution server required for the search request according to the operation intention, and extracts search data from the search request, sending the search data to the distribution server corresponding to the operation intention;

所述分发服务器对接收到的所述搜索数据进行拆分，得到多个子搜索数据，并将各个所述子搜索数据分发给不同的预设查询引擎；The distribution server splits the received search data to obtain a plurality of sub-search data, and distributes each of the sub-search data to different preset query engines;

各个所述预设查询引擎基于接收到的子搜索数据，从预设数据库中获取所述子搜索数据对应的目标数据，将所述目标数据发送给所述分发服务器；Each of the preset query engines obtains target data corresponding to the sub-search data from a preset database based on the received sub-search data, and sends the target data to the distribution server;

所述分发服务器接收各个所述预设查询引擎发送的所述目标数据，将所有所述目标数据进行合并后生成查询结果，将所述查询结果发送给所述中转服务器；The distribution server receives the target data sent by each of the preset query engines, combines all the target data to generate a query result, and sends the query result to the relay server;

所述中转服务器对所述查询结果的所有字段进行敏感词过滤，并将含有敏感词的字段替换为无法识别的目标文本，生成目标查询结果，将所述目标查询结果发送给所述用户。The relay server performs sensitive word filtering on all fields of the query result, replaces the fields containing the sensitive words with unrecognized target text, generates a target query result, and sends the target query result to the user.

优选地，所述用户的操作意向包括对业务数据的查询，各个所述预设查询引擎配置有自然语言处理模型，所述各个所述预设查询引擎基于接收到的子搜索数据，从预设数据库中获取所述子搜索数据对应的目标数据，包括：Preferably, the user's operation intention includes a query for business data, each of the preset query engines is configured with a natural language processing model, and each of the preset query engines is based on the received sub-search data, from preset Obtaining the target data corresponding to the sub-search data in the database includes:

各个所述预设查询引擎对所述子搜索数据进行特征提取，并转换为特征向量；Each of the preset query engines performs feature extraction on the sub-search data, and converts them into feature vectors;

将所述特征向量输入所述自然语言处理模型中，得到目标数据；其中，所述自然语言处理模型为预先训练好的神经网络模型，用于根据所述特征向量从预设数据库中获取目标数据。Inputting the feature vector into the natural language processing model to obtain target data; wherein the natural language processing model is a pre-trained neural network model for obtaining target data from a preset database according to the feature vector .

优选地，所述子搜索数据为文本，所述各个所述预设查询引擎对所述子搜索数据进行特征提取，包括：Preferably, the sub-search data is text, and each of the preset query engines performs feature extraction on the sub-search data, including:

各个所述预设查询引擎对所述文本进行分词，得到分词结果；Each of the preset query engines performs word segmentation on the text to obtain word segmentation results;

对所述分词结果进行词性标注，得到标注结果；Perform part-of-speech tagging on the word segmentation result to obtain a tagging result;

根据所述标注结果确定所述文本的关键字，将所述关键字转换为字向量，得到所述特征向量。The keyword of the text is determined according to the labeling result, and the keyword is converted into a word vector to obtain the feature vector.

进一步地，所述将所述特征向量输入所述自然语言处理模型中之前，还包括：Further, before the inputting the feature vector into the natural language processing model, the method further includes:

各个所述预设查询引擎获取样本数据，对所述样本数据进行特征抽取，形成样本特征向量；Each of the preset query engines obtains sample data, performs feature extraction on the sample data, and forms a sample feature vector;

将所述样本特征向量和期望的标准查询结果输入初始自然语言处理模型中进行训练；Inputting the sample feature vector and the expected standard query result into the initial natural language processing model for training;

判断所述初始自然语言处理模型输出的训练结果是否满足要求；Judging whether the training result output by the initial natural language processing model meets the requirements;

若是，将训练结果满足要求的初始自然语言处理模型作为所述自然语言处理模型；If so, use the initial natural language processing model whose training result meets the requirements as the natural language processing model;

若否，根据所述训练结果调整所述初始自然语言处理模型的参数，并返回执行所述将所述样本特征向量和期望的标准查询结果输入初始自然语言处理模型中进行训练的步骤，以对调整参数后的所述初始自然语言处理模型再次训练，直至所述初始自然语言处理模型输出的训练结果满足要求为止。If not, adjust the parameters of the initial natural language processing model according to the training results, and return to executing the step of inputting the sample feature vector and the expected standard query result into the initial natural language processing model for training, so as to perform training on the initial natural language processing model. The initial natural language processing model after adjusting the parameters is retrained until the training result output by the initial natural language processing model meets the requirements.

优选地，所述操作意向对应的分发服务器包括自助查询SQL服务，所述将所述搜索数据发送至所述操作意向对应的分发服务器，包括：Preferably, the distribution server corresponding to the operation intention includes a self-service query SQL service, and the sending the search data to the distribution server corresponding to the operation intention includes:

所述中转服务器判断所述搜索数据是否为SQL查询语句；The relay server judges whether the search data is an SQL query statement;

若是，则将所述搜索数据发送至自助查询SQL服务。If so, send the search data to the self-service query SQL service.

优选地，所述将所述目标查询结果发送给所述用户，包括：Preferably, the sending the target query result to the user includes:

所述中转服务器从所述搜索请求中确定所述用户的权限范围；The relay server determines the authority scope of the user from the search request;

将所述目标查询结果中不属于所述权限范围内的数据删除，并将删除数据后的所述目标查询结果发送给所述用户。The data in the target query result that does not belong to the scope of authority is deleted, and the target query result after the data is deleted is sent to the user.

所述中转服务器将所述目标查询结果进行封装，并确定所述目标查询结果的数据类型；The relay server encapsulates the target query result, and determines the data type of the target query result;

将封装后的所述目标查询结果及数据类型发送至所述用户所在终端，以使所述终端解析所述目标查询结果后，按照所述数据类型进行展示。The encapsulated target query result and data type are sent to the terminal where the user is located, so that after the terminal parses the target query result, it is displayed according to the data type.

本申请还提供一种数据搜索系统，其包括：The application also provides a data search system, which includes:

中转服务器，用于获取用户的搜索请求，根据所述搜索请求确定所述用户的操作意向，根据所述操作意向查询所述搜索请求所需使用的分发服务器，并从所述搜索请求提取搜索数据，将所述搜索数据发送至所述操作意向对应的分发服务器；A relay server, configured to obtain a user's search request, determine the user's operation intention according to the search request, query the distribution server to be used by the search request according to the operation intention, and extract search data from the search request , sending the search data to the distribution server corresponding to the operation intention;

分发服务器，用于对接收到的所述搜索数据进行拆分，得到多个子搜索数据，并将各个所述子搜索数据分发给不同的预设查询引擎；a distribution server, configured to split the received search data to obtain a plurality of sub-search data, and distribute each of the sub-search data to different preset query engines;

多个预设查询引擎，各个所述预设查询引擎用于基于接收到的子搜索数据，从预设数据库中获取所述子搜索数据对应的目标数据，将所述目标数据发送给所述分发服务器；A plurality of preset query engines, each of which is used to obtain target data corresponding to the sub-search data from a preset database based on the received sub-search data, and send the target data to the distribution server;

所述分发服务器还用于接收各个所述预设查询引擎发送的所述目标数据，将所有所述目标数据进行合并后生成查询结果，将所述查询结果发送给所述中转服务器；The distribution server is further configured to receive the target data sent by each of the preset query engines, combine all the target data to generate a query result, and send the query result to the relay server;

所述中转服务器还用于对所述查询结果的所有字段进行敏感词过滤，并将含有敏感词的字段替换为无法识别的目标文本，生成目标查询结果，将所述目标查询结果发送给所述用户。The relay server is also used to filter sensitive words in all fields of the query results, replace the fields containing sensitive words with unrecognized target text, generate target query results, and send the target query results to the user.

优选地，所述用户的操作意向包括对业务数据的查询，各个所述预设查询引擎配置有自然语言处理模型；其中，Preferably, the user's operation intention includes querying business data, and each of the preset query engines is configured with a natural language processing model; wherein,

各个所述预设查询引擎还用于对所述子搜索数据进行特征提取，并转换为特征向量，将所述特征向量输入所述自然语言处理模型中，得到目标数据；其中，所述自然语言处理模型为预先训练好的神经网络模型，用于根据所述特征向量从预设数据库中获取目标数据。Each of the preset query engines is also used to perform feature extraction on the sub-search data, convert it into a feature vector, and input the feature vector into the natural language processing model to obtain target data; wherein, the natural language The processing model is a pre-trained neural network model for acquiring target data from a preset database according to the feature vector.

本申请还提供一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，该计算机程序被处理器执行时实现上述任一项所述方法的步骤。The present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of any one of the above-mentioned methods are implemented.

本申请所提供的一种数据搜索方法、系统及存储介质，通过中转服务器获取用户的搜索请求，根据搜索请求确定用户的操作意向，根据操作意向查询搜索请求所需使用的分发服务器，然后从搜索请求提取搜索数据，将搜索数据发送至对应操作意向的分发服务器，分发服务器对搜索数据进行拆分，得到多个子搜索数据，并将各个子搜索数据分发给不同的预设查询引擎，预设查询引擎从预设数据库中获取子搜索数据对应的目标数据；分发服务器对各目标数据合并后生成查询结果，中转服务器对查询结果的所有字段进行敏感词过滤，并将含有敏感词的字段替换为无法识别的目标文本，生成目标查询结果，确保数据的安全性；最后将所述目标查询结果发送给用户。本申请根据用户的搜索请求确定用户的操作意向，基于不同的操作意向选择相应的分发服务器，以快速找到所需的数据，避免大量的无效搜索，提高了数据查询效率，同时支持不同的查询方式；此外，通过将各个子搜索数据分发给不同的预设查询引擎，以通过多个预设查询引擎同时从预设数据库中获取子搜索数据对应的目标数据，实现了同步查询，进一步提高了查询效率。In a data search method, system and storage medium provided by the present application, a user's search request is obtained through a relay server, the user's operation intention is determined according to the search request, the distribution server required for the search request is queried according to the operation intention, and then the search request is obtained from the search request. Request to extract search data, send the search data to the distribution server corresponding to the operation intention, the distribution server splits the search data, obtains multiple sub-search data, and distributes each sub-search data to different preset query engines, preset query The engine obtains the target data corresponding to the sub-search data from the preset database; the distribution server merges the target data to generate query results, and the relay server filters all fields of the query results for sensitive words, and replaces the fields containing sensitive words with the ones that cannot. The identified target text is generated to generate target query results to ensure data security; finally, the target query results are sent to the user. This application determines the user's operation intention according to the user's search request, and selects the corresponding distribution server based on different operation intentions, so as to quickly find the required data, avoid a large number of invalid searches, improve the data query efficiency, and support different query methods. In addition, by distributing each sub-search data to different preset query engines, the target data corresponding to the sub-search data can be simultaneously obtained from the preset database through multiple preset query engines, thereby realizing synchronous query and further improving the query performance. efficiency.

附图说明Description of drawings

图1为本申请一实施例的数据搜索方法的流程示意图；1 is a schematic flowchart of a data search method according to an embodiment of the present application;

图2为本申请一实施例的数据搜索系统的结构示意框图。FIG. 2 is a schematic structural block diagram of a data search system according to an embodiment of the present application.

本申请目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization, functional characteristics and advantages of the purpose of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中，人工智能(Artificial Intelligence，AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present application may acquire and process related data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

参考图1，本申请提出一种数据搜索方法，以数据搜索系统为执行主体，数据搜索系统可以可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network，CDN)、以及大数据和人工智能平台等基础云计算服务的计算机集群。Referring to FIG. 1, the present application proposes a data search method, with a data search system as the execution body, and the data search system may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, intermediate It is a computer cluster of basic cloud computing services such as software services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms.

本申请中，该数据搜索方法用于提高数据搜索效率，数据搜索成为很多场景中用户快速获取所需信息的渠道。例如在医疗领域中，可以基于人工智能模型从海量的电子病历中查询用户所需的病历信息，有助于为用户提供病历参考。参照图1，其中一个实施例中，所述数据搜索系统包括中转服务器、分发服务器及预设查询引擎，该数据搜索方法包括如下步骤：In this application, the data search method is used to improve data search efficiency, and data search has become a channel for users to quickly obtain required information in many scenarios. For example, in the medical field, the medical record information required by the user can be queried from the massive electronic medical record based on the artificial intelligence model, which is helpful to provide the user with the medical record reference. 1, in one embodiment, the data search system includes a relay server, a distribution server and a preset query engine, and the data search method includes the following steps:

S11、所述中转服务器获取用户的搜索请求，根据所述搜索请求确定所述用户的操作意向，根据所述操作意向查询所述搜索请求所需使用的分发服务器，并从所述搜索请求提取搜索数据，将所述搜索数据发送至所述操作意向对应的分发服务器；S11. The relay server acquires the user's search request, determines the user's operation intention according to the search request, queries the distribution server to be used by the search request according to the operation intention, and extracts the search request from the search request data, sending the search data to the distribution server corresponding to the operation intention;

S12、所述分发服务器对接收到的所述搜索数据进行拆分，得到多个子搜索数据，并将各个所述子搜索数据分发给不同的预设查询引擎；S12, the distribution server splits the received search data to obtain multiple sub-search data, and distributes each of the sub-search data to different preset query engines;

S13、各个所述预设查询引擎基于接收到的子搜索数据，从预设数据库中获取所述子搜索数据对应的目标数据，将所述目标数据发送给所述分发服务器；S13. Each of the preset query engines acquires target data corresponding to the sub-search data from a preset database based on the received sub-search data, and sends the target data to the distribution server;

S14、所述分发服务器接收各个所述预设查询引擎发送的所述目标数据，将所有所述目标数据进行合并后生成查询结果，将所述查询结果发送给所述中转服务器；S14. The distribution server receives the target data sent by each of the preset query engines, combines all the target data to generate a query result, and sends the query result to the relay server;

S15、所述中转服务器对所述查询结果的所有字段进行敏感词过滤，并将含有敏感词的字段替换为无法识别的目标文本，生成目标查询结果，将所述目标查询结果发送给所述用户。S15. The relay server performs sensitive word filtering on all fields of the query result, replaces the fields containing sensitive words with unrecognized target text, generates a target query result, and sends the target query result to the user .

为了解决搜索框支持SQL(Structured Query Language,结构化查询语言)执行、业务数据查询、表查询、字段查询、系统菜单查询、产品数据查询、权限数据查询等等的查询问题，本申请引入了不同的分发服务器进行业务搜索检测判断。在实际业务使用上，本申请根据用户的搜索请求，去甄别当前用户具体的操作意向，如用户是需要访问界面菜单，还是查询数据库相关表数据，亦或者查询公司具体的某块业务数据，或查询某张表的某些字段、或者执行一段自助查询的SQL语句等等的一系列操作，当确定用户的操作意向后，将用户的操作意向进行区分处理，利用相应的分发服务器进行数据获取。若需要进行执行自助查询的SQL语句，则进行自助查询服务的相关业务逻辑执行。若是查询业务界面数据、快速界面跳转等等，则进行NLU(Natural Language Understanding，自然语言理解技术)服务进行用户操作意向与查询结果的匹配。In order to solve the query problem that the search box supports SQL (Structured Query Language) execution, business data query, table query, field query, system menu query, product data query, authority data query, etc., this application introduces different The distribution server performs service search detection and judgment. In actual business use, this application identifies the specific operation intention of the current user according to the user's search request, such as whether the user needs to access the interface menu, or query the relevant table data in the database, or query a specific piece of business data of the company, or A series of operations such as querying certain fields of a table, or executing a self-service query SQL statement, etc., when the user's operation intention is determined, the user's operation intention is differentiated and processed, and the corresponding distribution server is used for data acquisition. If the SQL statement for executing the self-service query needs to be executed, the relevant business logic of the self-service query service is executed. For querying business interface data, quick interface jumping, etc., the NLU (Natural Language Understanding, natural language understanding technology) service is performed to match the user's operation intention with the query result.

具体的，如上述步骤S11所述，中转服务器与用户所在的终端直接连接，用于接收用户的搜索请求，并将搜索结果返回给用户。用户可在搜索栏输入搜索条件，发起搜索请求，中转服务器获取该搜索请求，搜索请求中可包括用户的身份信息、用户权限、搜索数据，搜索数据的类型可以是文本、语音或图片等等。Specifically, as described in the above step S11, the relay server is directly connected to the terminal where the user is located, and is used for receiving the user's search request and returning the search result to the user. The user can enter search conditions in the search bar, initiate a search request, and the relay server obtains the search request. The search request can include the user's identity information, user permissions, and search data. The type of search data can be text, voice, or pictures, etc.

其中，操作意向是指用户需要获取什么数据或从何处获取所需数据的意愿。该操作意向可基于用户信息或搜索数据的数据类型确定，可预先构建操作意向与用户信息或数据类型的关系列表，基于关系列表确定用户的操作意向。例如，在接收到搜索请求后，可根据搜索请求中搜索数据的数据类型从所述关系列表中查询该用户的操作意向，如用户是需要访问界面菜单，还是查询数据库相关表数据，亦或者查询公司具体的某块业务数据，或查询某张表的某些字段、或者执行一段自助查询的SQL语句等等的一系列操作。例如，当数据类型为业务数据类型时，则该用户的操作意向是从该业务数据类型对应的分发服务器中获取公司具体的某块业务数据。又如，还可提取搜索请求中的用户信息，基于用户信息查询该用户历史使用过次数最多的分发服务器作为目标分发服务器，将调用目标分发服务器进行数据获取的动作作为用户的操作意向，从而在明确用户的操作意向后进行相应的数据搜索，降低搜索范围，提高数据搜索效率。Among them, the operation intention refers to the willingness of the user to obtain what data or where to obtain the required data. The operation intention may be determined based on the data type of the user information or the search data, a relationship list between the operation intention and the user information or data type may be pre-built, and the user's operation intention may be determined based on the relationship list. For example, after receiving the search request, the user's operation intention can be queried from the relationship list according to the data type of the search data in the search request, such as whether the user needs to access the interface menu, or query the data of the relevant table in the database, or query A specific piece of business data of the company, or query some fields of a table, or execute a series of operations such as a self-service query SQL statement. For example, when the data type is the business data type, the user's operation intention is to obtain a specific piece of business data of the company from the distribution server corresponding to the business data type. For another example, the user information in the search request can also be extracted, and the distribution server that the user has used the most times in the history based on the user information is used as the target distribution server, and the action of calling the target distribution server for data acquisition is used as the user's operation intention, thereby in After the user's operation intention is clarified, the corresponding data search is carried out, the search scope is reduced, and the data search efficiency is improved.

因此，不同的操作意向对应不同的分发服务器，以根据用户的操作意向选择相应的分发服务器，在该分发服务器中进行数据获取，以避免大量的无效搜索，提高搜索效率。其中，该分发服务器与预设查询引擎连接，用于将拆分后的搜索数据分配至相应的预设查询引擎进行查询，并接收与搜索数据相匹配的查询结果，查询结果可以是一句话或一段文本，例如，搜索数据是若干个关键词，则查询结果可以是包括至少一个关键词的文本。Therefore, different operation intentions correspond to different distribution servers, so as to select a corresponding distribution server according to the user's operation intention, and perform data acquisition in the distribution server to avoid a large number of invalid searches and improve search efficiency. The distribution server is connected to the preset query engine, and is used for distributing the split search data to the corresponding preset query engine for query, and receiving query results matching the search data. The query results can be a sentence or A piece of text, for example, the search data is several keywords, the query result may be a text including at least one keyword.

如上述步骤S12-S13所述，可通过消息中间件将各个子搜索数据分发给预设查询引擎，以供所述预设查询引擎从预设数据库中获取所述子搜索数据对应的目标数据。其中，预设查询引擎可以存在多个，当预设查询引擎存在多个时，可以通过均分的方式将子搜索数据发给不同的预设查询引擎，各个所述预设查询引擎从预设数据库中获取所述子搜索数据对应的目标数据，将目标数据发送给分发服务器。As described in the above steps S12-S13, each sub-search data can be distributed to the preset query engine through the message middleware, so that the preset query engine can obtain the target data corresponding to the sub-search data from the preset database. Wherein, there may be multiple preset query engines. When there are multiple preset query engines, the sub-search data may be sent to different preset query engines in an even manner, and each of the preset query engines will start from the preset query engine. The target data corresponding to the sub-search data is obtained from the database, and the target data is sent to the distribution server.

在本实施例中，预设查询引擎直接和预设数据库进行交互，负责数据的存取。在预设查询引擎进行数据的存取过程中，首先将子搜索数据进行数据转换，并提交给查询执行器，查询执行器多路并发地向预设数据库对应各个集群发起查询，当其中一路集群查询成功后，结束其他各路集群查询并返回目标数据。In this embodiment, the preset query engine directly interacts with the preset database and is responsible for data access. In the process of data access by the preset query engine, the sub-search data is first converted into data and submitted to the query executor. The query executor concurrently initiates queries to each cluster corresponding to the preset database. After the query is successful, end other cluster queries and return the target data.

如上述步骤S14所述，分发服务器接收各个预设查询引擎发送的目标数据，将目标数据进行合并后，生成查询结果，将所述查询结果发送给所述中转服务器。As described in step S14 above, the distribution server receives the target data sent by each preset query engine, combines the target data, generates a query result, and sends the query result to the relay server.

本实施例通过预设消息中间件的设置，实现为分发服务器和查询引擎提供高吞吐量、高可靠的消息通信通道，以满足不断增长的数据查询需求。This embodiment provides a high-throughput and high-reliability message communication channel for the distribution server and the query engine by presetting the settings of the message middleware, so as to meet the ever-increasing demand for data query.

如上述步骤S15所述，分发服务器在生成查询结果之后，将查询结果进行压缩后发送给中转服务器，本实施例的中转服务器接收所述分发服务器发送的查询结果，并对查询结果进行解压后，提取其中的数据，然后对所述查询结果的所有字段进行敏感词过滤，并将含有敏感词的字段替换为无法识别的文本，如乱码，最后得到目标查询结果，最后将目标查询结果发送给用户，并在用户所在的终端以列表的形式显示目标查询结果。As described in the above step S15, after generating the query result, the distribution server compresses the query result and sends it to the relay server. The relay server in this embodiment receives the query result sent by the distribution server, and decompresses the query result. Extract the data in it, then filter all the fields of the query result with sensitive words, and replace the fields containing the sensitive words with unrecognized text, such as garbled characters, finally get the target query result, and finally send the target query result to the user , and display the target query results in the form of a list on the terminal where the user is located.

本申请所提供的一种数据搜索方法，通过中转服务器获取用户的搜索请求，根据搜索请求确定用户的操作意向，根据操作意向查询搜索请求所需使用的分发服务器，然后从搜索请求提取搜索数据，将搜索数据发送至操作意向对应的分发服务器，分发服务器对搜索数据进行拆分，得到多个子搜索数据，并将各个子搜索数据分发给不同的预设查询引擎，预设查询引擎从预设数据库中获取子搜索数据对应的目标数据；分发服务器对各目标数据合并后生成查询结果，中转服务器对查询结果的所有字段进行敏感词过滤，并将含有敏感词的字段替换为无法识别的目标文本，生成目标查询结果，确保数据的安全性；最后将所述目标查询结果发送给用户。本申请根据用户的搜索请求确定用户的操作意向，基于不同的操作意向选择相应的分发服务器，以快速找到所需的数据，避免大量的无效搜索，提高了数据查询效率，同时支持不同的查询方式；此外，通过将各个子搜索数据分发给不同的预设查询引擎，以通过多个预设查询引擎同时从预设数据库中获取子搜索数据对应的目标数据，实现了同步查询，进一步提高了查询效率。In a data search method provided by the present application, a user's search request is obtained through a relay server, the user's operation intention is determined according to the search request, a distribution server required for the search request is queried according to the operation intention, and then search data is extracted from the search request, Send the search data to the distribution server corresponding to the operation intention, the distribution server splits the search data, obtains multiple sub-search data, and distributes each sub-search data to different preset query engines, and the preset query engine is extracted from the preset database. The target data corresponding to the sub-search data is obtained from the sub-search data; the distribution server generates query results after merging the target data, and the relay server filters all fields of the query results for sensitive words, and replaces the fields containing sensitive words with unrecognized target texts. A target query result is generated to ensure data security; finally, the target query result is sent to the user. This application determines the user's operation intention according to the user's search request, and selects the corresponding distribution server based on different operation intentions, so as to quickly find the required data, avoid a large number of invalid searches, improve the data query efficiency, and support different query methods. In addition, by distributing each sub-search data to different preset query engines, the target data corresponding to the sub-search data can be simultaneously obtained from the preset database through multiple preset query engines, thereby realizing synchronous query and further improving the query performance. efficiency.

在一实施例中，所述用户的操作意向包括对业务数据的查询，各个所述预设查询引擎配置有自然语言处理模型，所述各个所述预设查询引擎基于接收到的子搜索数据，从预设数据库中获取所述子搜索数据对应的目标数据，可具体包括：In one embodiment, the user's operation intention includes a query for business data, each of the preset query engines is configured with a natural language processing model, and each of the preset query engines is based on the received sub-search data, Obtaining the target data corresponding to the sub-search data from the preset database may specifically include:

本实施例可利用结巴分词工具对用户输入的搜索数据进行结巴分词，得到多个词，以及确定搜索数据的开始信息和结束信息，对搜索数据的多个词进行特征抽取，得到特征向量，将搜索数据的特征向量再传入到自然语言处理模型中进行数据查询，自然语言处理模型对特征向量进行识别后，从预设数据库中查询该特征向量对应的目标数据。具体的，利用自然语言处理模型确定特征向量对应的起始查询实体以及结束查询实体，从预设数据库中查询以起始查询实体为起始节点、以结束查询实体为结束节点的数据链，将数据链作为目标数据。例如，用户输入的搜索数据对应的特征向量是“小李见到张三”对应的句向量，利用自然语言处理模型解析“小李见到张三”对应的句向量，得到“小李”、“张三”两个词向量，并确定起始查询实体为“小李”、结束查询实体为“张三”。最终从数据库中查询出数据链：“小李”－&gt；“公园”－&gt；“湖边”－&gt；“张三”－&gt。In this embodiment, the stuttering word segmentation tool can be used to perform stuttering word segmentation on the search data input by the user, to obtain multiple words, and to determine the start information and end information of the search data, and to perform feature extraction on the multiple words of the search data to obtain feature vectors. The feature vector of the search data is then passed into the natural language processing model for data query. After the natural language processing model identifies the feature vector, the target data corresponding to the feature vector is queried from the preset database. Specifically, the natural language processing model is used to determine the starting query entity and the ending query entity corresponding to the feature vector, and the data chain with the starting query entity as the starting node and the ending query entity as the ending node is queried from the preset database, and the Datalink as target data. For example, the feature vector corresponding to the search data input by the user is the sentence vector corresponding to "Xiao Li meets Zhang San", and the natural language processing model is used to parse the sentence vector corresponding to "Xiao Li meets Zhang San" to obtain "Xiao Li", Two word vectors of "Zhang San" are determined, and the starting query entity is "Xiao Li" and the ending query entity is "Zhang San". Finally, the data link is queried from the database: "Xiao Li" -> "Park" -> "Lakeside" -> "Zhang San" ->.

在一实施例中，所述子搜索数据为文本，所述各个所述预设查询引擎对所述子搜索数据进行特征提取，可具体包括：In one embodiment, the sub-search data is text, and each of the preset query engines performs feature extraction on the sub-search data, which may specifically include:

本实施例可利用结巴分词工具对文本进行结巴分词，得到由多个词构成的分词结果，对分词结果中的分词进行词性标注，得到标注结果，如将多个词分别标注为形容词、动词、副词、语气词等等。最后根据所述标注结果确定文本的关键字，利用向量工具将关键字转换为字向量，得到所述特征向量。例如，将文本中的动词标注为关键字，并转换为特征向量。In this embodiment, the stammering word segmentation tool can be used to perform stammering word segmentation on the text, and a word segmentation result composed of multiple words can be obtained. Adverbs, modal particles, etc. Finally, the keyword of the text is determined according to the labeling result, and the keyword is converted into a word vector by using a vector tool to obtain the feature vector. For example, label verbs in text as keywords and convert them into feature vectors.

在一实施例中，所述将所述特征向量输入所述自然语言处理模型中之前，还可包括：In an embodiment, before the inputting the feature vector into the natural language processing model, the method may further include:

本实施例可采集AI平台相关数据，包含模型相关数据、API(Application ProgramInterface，应用程序接口)数据、SDK(Software Development Kit，软件开发工具包)数据、医生数据、医疗数据、疾病数据等等，采集公司和国家层面的敏感词汇，作为训练数据中安全防范相关处理数据，采集数据中台运营相关数据，包含用户数据、权限数据、团队数据、菜单数据等等作为站内二次搜索相关支持数据，然后将采集的所有数据作为样本数据。In this embodiment, AI platform-related data can be collected, including model-related data, API (Application Program Interface) data, SDK (Software Development Kit, software development kit) data, doctor data, medical data, disease data, and the like, Collect sensitive words at the company and national level as training data related to security prevention, and collect data related to middle-office operations, including user data, permission data, team data, menu data, etc. as support data for secondary search in the site. All the data collected are then used as sample data.

在一实施例中，所述样本数据可存储在医疗云(Medical cloud)，医疗云是指在云计算、移动技术、多媒体、4G通信、大数据、以及物联网等新技术基础上，结合医疗技术，使用“云计算”来创建医疗健康服务云平台，实现了医疗资源的共享和医疗范围的扩大。因为云计算技术的运用于结合，医疗云提高医疗机构的效率，方便居民就医。像现在医院的预约挂号、电子病历、医保等都是云计算与医疗领域结合的产物，医疗云还具有数据安全、信息共享、动态扩展、布局全局的优势。In one embodiment, the sample data can be stored in a medical cloud (Medical cloud), which refers to a combination of medical Technology, using "cloud computing" to create a cloud platform for medical and health services, to achieve the sharing of medical resources and the expansion of medical scope. Because of the combination of cloud computing technology, medical cloud improves the efficiency of medical institutions and facilitates residents to seek medical treatment. For example, the hospital's appointment registration, electronic medical records, medical insurance, etc. are all products of the combination of cloud computing and the medical field. The medical cloud also has the advantages of data security, information sharing, dynamic expansion, and overall layout.

在一实施例中，可采用词袋模型的权重计算方法对采集的AI平台数据、安全数据、中台数据等等的样本数据进行抽取特征，形成样本特征向量，然后使用机器学习算法对提取后相关数据的样本特征向量和期望的标准查询结果输入初始自然语言处理模型中进行多次训练，并在每次训练之后计算所述初始自然语言处理模型的损失值，判断所述损失值是否低于预设损失值，若是，则判定初始自然语言处理模型输出的训练结果满足要求。In one embodiment, the weight calculation method of the bag of words model can be used to extract features from the collected sample data of AI platform data, security data, middle-stage data, etc. to form a sample feature vector, and then use a machine learning algorithm to extract features. The sample feature vector of the relevant data and the expected standard query result are input into the initial natural language processing model for multiple training, and after each training, the loss value of the initial natural language processing model is calculated to determine whether the loss value is lower than The preset loss value, if yes, determines that the training result output by the initial natural language processing model meets the requirements.

若否，则调整初始自然语言处理模型的参数，对其再次训练，直至所述损失值低于预设损失值时为止，从而将当前的初始自然语言处理模型作为自然语言处理模型。If not, the parameters of the initial natural language processing model are adjusted and retrained until the loss value is lower than the preset loss value, so that the current initial natural language processing model is used as the natural language processing model.

在一实施例中，所述操作意向对应的分发服务器包括自助查询SQL服务，所述将所述搜索数据发送至所述操作意向对应的分发服务器，可具体包括：In one embodiment, the distribution server corresponding to the operation intention includes a self-service query SQL service, and the sending the search data to the distribution server corresponding to the operation intention may specifically include:

在本实施例中，可在中转服务器上配置安全验证模块，并在安全验证模块上添加安全服务单元与过滤服务单元，以进行数据的敏感词识别及过滤。它的运行原理是：对输入的搜索数据进行格式检测，判断到当前搜索数据是否是SQL(Structured Query Language,结构化查询语言)查询语句，若是，则进行链接自助分发服务器，当搜索数据并非全SQL查询语句时，则进行NLU(Natural Language Understanding，自然语言理解技术)相关业务的逻辑处理，得到查询结果，然后对查询结果的所有字段进行敏感词过滤，并将含有敏感词的字段替换为无法识别的目标文本，最终生成目标查询结果。In this embodiment, a security verification module may be configured on the transit server, and a security service unit and a filtering service unit may be added to the security verification module to identify and filter sensitive words of data. Its operating principle is: Detect the format of the input search data, and determine whether the current search data is a SQL (Structured Query Language) query statement, if so, link the self-service distribution server. When a SQL query statement is used, the logical processing of NLU (Natural Language Understanding, natural language understanding technology) related business is performed to obtain the query result, and then all fields of the query result are filtered by sensitive words, and the fields containing sensitive words are replaced with non-sensitive words. The recognized target text, and finally generate the target query result.

其中，结构化查询语言(Structured Query Language)简称SQL，是一种特殊目的的编程语言，是一种数据库查询和程序设计语言，用于存取数据以及查询、更新和管理关系数据库系统。在本申请实施例中，SQL查询语句是根据结构化查询语言进行编写的查询语句，用于对大数据进行数据查询。Among them, Structured Query Language (SQL) for short, is a special-purpose programming language, a database query and programming language, used for accessing data and querying, updating and managing relational database systems. In the embodiment of the present application, the SQL query statement is a query statement written according to a structured query language, and is used to perform data query on big data.

自助查询SQL服务接收到SQL查询语句后，对SQL查询语句进行解析，以获取SQL查询语句中对应的执行计划，该执行计划包括了数据查询任务需要访问的各种查询表、过滤条件和文件字段信息等元数据信息，所以对执行计划进行元数据信息提取后，获取到执行计划中对应的查询表和过滤信息。After receiving the SQL query statement, the self-service query SQL service parses the SQL query statement to obtain the corresponding execution plan in the SQL query statement. The execution plan includes various query tables, filter conditions and file fields that need to be accessed by the data query task. Information and other metadata information, so after the metadata information is extracted from the execution plan, the corresponding query table and filter information in the execution plan are obtained.

其中，过滤信息是指在SQL查询语句中对待查询文件进行限定的过滤条件和筛选信息。其中，过滤条件和筛选信息可以分别包括日期筛选信息和文件字段信息，其为后续进行数据查询时，提供筛选条件，以减少文件扫描范围，进而提高对大数据的数据查询效率。The filtering information refers to filtering conditions and filtering information that are limited in the SQL query statement for the file to be queried. The filter conditions and filter information may include date filter information and file field information respectively, which provide filter conditions for subsequent data query, so as to reduce the scope of file scanning, thereby improving the data query efficiency of big data.

然后获取预设匹配规则，并通过预设匹配规则对SQL查询语句进行匹配处理，得到匹配后SQL查询语句。其中，预设匹配规则可以是由开发人员事先设置的SQL对应的匹配规则，其目的是将SQL查询语句进行优化，便于后续通过大数据查询引擎对SQL查询语句进行数据查询。预设匹配规则可以由SQL优化器执行对SQL查询语句匹配处理。Then, a preset matching rule is acquired, and the SQL query statement is matched through the preset matching rule to obtain a matched SQL query statement. The preset matching rule may be a matching rule corresponding to SQL set in advance by the developer, the purpose of which is to optimize the SQL query statement, so as to facilitate subsequent data query on the SQL query statement through the big data query engine. The preset matching rule can be executed by the SQL optimizer to perform matching processing on the SQL query statement.

具体的，服务器在获取到SQL查询语句后，再获取该SQL查询语句对应的预设匹配规则，通过预设的SQL优化器，根据预设匹配规则对SQL查询语句进行匹配处理，也即预设匹配规则对SQL查询语句进行转化，从而得到匹配的目标SQL查询语句。Specifically, after acquiring the SQL query statement, the server acquires the preset matching rule corresponding to the SQL query statement, and uses the preset SQL optimizer to perform matching processing on the SQL query statement according to the preset matching rule, that is, preset The matching rule transforms the SQL query statement to obtain a matching target SQL query statement.

最后通过调用查询引擎对目标SQL查询语句进行数据查询，其查询引擎通过解析目标SQL查询语句，获取其目标查询表中的相关信息，得到文件扫描范围，并对该范围的文件信息进行扫描，从而得到查询结果，并将该查询结果返回该用户，从而大大减小了数据查询范围，有利于提高对大数据的数据查询效率。Finally, by calling the query engine to query the target SQL query statement, the query engine obtains the relevant information in the target query table by parsing the target SQL query statement, obtains the file scanning range, and scans the file information in this range, thereby The query result is obtained, and the query result is returned to the user, thereby greatly reducing the data query scope and improving the data query efficiency of big data.

在一实施例中，所述将所述目标查询结果发送给所述用户，可具体包括：In one embodiment, the sending the target query result to the user may specifically include:

在本实施例中，对涉及公司安全、战略、客户信息等一系列数据进行用户权限检测，但凡有非权限人员通过非法手段试图查询数据，都予以反馈无权访问，如无权限则进行搜索撤销操作，反馈:{type:‘DBSQLSearchNoAuth’，data:‘暂无搜索权限’}，或者将目标查询结果中不属于所述权限范围内的数据删除，并将删除数据后的目标查询结果发送给用户，以提高数据的安全性。In this embodiment, user authority detection is performed on a series of data related to company security, strategy, customer information, etc., if any unauthorized person tries to query the data through illegal means, they will give feedback that they have no right to access, and if they have no authority, they will be searched and revoked. Operation, feedback: {type: 'DBSQLSearchNoAuth', data: 'No search authority for now'}, or delete the data in the target query result that does not belong to the scope of the authority, and send the target query result after deleting the data to the user , to improve data security.

在本实施例中，中转服务器对目标查询结果进行数据结构化封装，结构化封装是根据用户所在终端的界面需求创建结构匹配，并进行数据识别后的二次封装，用于让用户所在终端调用目标查询结果后可以正常处理、使用和分析。中转服务器对目标查询结果进行封装后，向用户所在终端返回目标查询结果及其数据类型，以告知终端区别展示目标查询结果。In this embodiment, the relay server performs data structured encapsulation on the target query result. The structured encapsulation is to create a structure match according to the interface requirements of the terminal where the user is located, and perform a secondary encapsulation after data identification, which is used for the terminal where the user is located to call After the target query results can be processed, used and analyzed normally. After encapsulating the target query result, the relay server returns the target query result and its data type to the terminal where the user is located, so as to inform the terminal to display the target query result differently.

参照图2，本申请实施例中还提供一种数据搜索系统，包括：Referring to FIG. 2, an embodiment of the present application also provides a data search system, including:

中转服务器11，用于获取用户的搜索请求，根据所述搜索请求确定所述用户的操作意向，根据所述操作意向查询所述搜索请求所需使用的分发服务器，并从所述搜索请求提取搜索数据，将所述搜索数据发送至所述操作意向对应的分发服务器；Therelay server 11 is used to obtain the user's search request, determine the user's operation intention according to the search request, query the distribution server required for the search request according to the operation intention, and extract the search request from the search request. data, sending the search data to the distribution server corresponding to the operation intention;

分发服务器12，用于对接收到的所述搜索数据进行拆分，得到多个子搜索数据，并将各个所述子搜索数据分发给不同的预设查询引擎；Thedistribution server 12 is configured to split the received search data to obtain a plurality of sub-search data, and distribute each of the sub-search data to different preset query engines;

多个预设查询引擎13，各个所述预设查询引擎13用于基于接收到的子搜索数据，从预设数据库中获取所述子搜索数据对应的目标数据，将所述目标数据发送给所述分发服务器12；A plurality ofpreset query engines 13, each of thepreset query engines 13 is used to obtain the target data corresponding to the sub-search data from the preset database based on the received sub-search data, and send the target data to the thedistribution server 12;

所述分发服务器12还用于接收各个所述预设查询引擎13发送的所述目标数据，将所有所述目标数据进行合并后生成查询结果，将所述查询结果发送给所述中转服务器11；Thedistribution server 12 is further configured to receive the target data sent by each of thepreset query engines 13, combine all the target data to generate a query result, and send the query result to therelay server 11;

所述中转服务器11还用于对所述查询结果的所有字段进行敏感词过滤，并将含有敏感词的字段替换为无法识别的目标文本，生成目标查询结果，将所述目标查询结果发送给所述用户。Therelay server 11 is also used to filter all fields of the query result with sensitive words, and replace the fields containing the sensitive words with unrecognized target text, generate a target query result, and send the target query result to all the fields. described user.

为了解决搜索框支持SQL(Structured Query Language,结构化查询语言)执行、业务数据查询、表查询、字段查询、系统菜单查询、产品数据查询、权限数据查询等等的查询问题，本申请引入了不同的分发服务器12进行业务搜索检测判断。在实际业务使用上，本申请根据用户的搜索请求，去甄别当前用户具体的操作意向，如用户是需要访问界面菜单，还是查询数据库相关表数据，亦或者查询公司具体的某块业务数据，或查询某张表的某些字段、或者执行一段自助查询的SQL语句等等的一系列操作，当确定用户的操作意向后，将用户的操作意向进行区分处理，利用相应的分发服务器12进行数据获取。若需要进行执行自助查询的SQL语句，则进行自助查询服务的相关业务逻辑执行。若是查询业务界面数据、快速界面跳转等等，则进行NLU(Natural Language Understanding，自然语言理解技术)服务进行用户操作意向与查询结果的匹配。In order to solve the query problem that the search box supports SQL (Structured Query Language) execution, business data query, table query, field query, system menu query, product data query, authority data query, etc., this application introduces different Thedistribution server 12 performs service search detection and judgment. In actual business use, this application identifies the specific operation intention of the current user according to the user's search request, such as whether the user needs to access the interface menu, or query the relevant table data in the database, or query a specific piece of business data of the company, or A series of operations such as querying certain fields of a table, or executing a self-service query SQL statement, etc., when the user's operation intention is determined, the user's operation intention is differentiated and processed, and thecorresponding distribution server 12 is used for data acquisition. . If the SQL statement for executing the self-service query needs to be executed, the relevant business logic of the self-service query service is executed. For querying business interface data, quick interface jumping, etc., the NLU (Natural Language Understanding, natural language understanding technology) service is performed to match the user's operation intention with the query result.

具体的，中转服务器11与用户所在的终端直接连接，用于接收用户的搜索请求，并将搜索结果返回给用户。用户可在搜索栏输入搜索条件，发起搜索请求，中转服务器11获取该搜索请求，搜索请求中可包括用户的身份信息、用户权限、搜索数据，搜索数据的类型可以是文本、语音或图片等等。Specifically, therelay server 11 is directly connected to the terminal where the user is located, and is used for receiving the user's search request and returning the search result to the user. The user can enter search conditions in the search bar to initiate a search request, and therelay server 11 obtains the search request. The search request may include the user's identity information, user authority, and search data. The type of search data may be text, voice, or pictures, etc. .

其中，操作意向是指用户需要获取什么数据或从何处获取所需数据的意愿。该操作意向可基于用户信息或搜索数据的数据类型确定，可预先构建操作意向与用户信息或数据类型的关系列表，基于关系列表确定用户的操作意向。例如，在接收到搜索请求后，可根据搜索请求中搜索数据的数据类型从所述关系列表中查询该用户的操作意向，如用户是需要访问界面菜单，还是查询数据库相关表数据，亦或者查询公司具体的某块业务数据，或查询某张表的某些字段、或者执行一段自助查询的SQL语句等等的一系列操作。例如，当数据类型为业务数据类型时，则该用户的操作意向是从该业务数据类型对应的分发服务器12中获取公司具体的某块业务数据。又如，还可提取搜索请求中的用户信息，基于用户信息查询该用户历史使用过次数最多的分发服务器12作为目标分发服务器，将调用目标分发服务器进行数据获取的动作作为用户的操作意向，从而在明确用户的操作意向后进行相应的数据搜索，降低搜索范围，提高数据搜索效率。Among them, the operation intention refers to the willingness of the user to obtain what data or where to obtain the required data. The operation intention may be determined based on the data type of the user information or the search data, a relationship list between the operation intention and the user information or data type may be pre-built, and the user's operation intention may be determined based on the relationship list. For example, after receiving the search request, the user's operation intention can be queried from the relationship list according to the data type of the search data in the search request, such as whether the user needs to access the interface menu, or query the data of the relevant table in the database, or query A specific piece of business data of the company, or query some fields of a table, or execute a series of operations such as a self-service query SQL statement. For example, when the data type is a business data type, the user's operation intention is to acquire a specific piece of business data of the company from thedistribution server 12 corresponding to the business data type. For another example, the user information in the search request can also be extracted, thedistribution server 12 that has been used the most times in the history of the user based on the user information is queried as the target distribution server, and the action of calling the target distribution server for data acquisition is regarded as the user's operation intention, thereby After the user's operation intention is clarified, the corresponding data search is carried out, the search scope is reduced, and the data search efficiency is improved.

因此，不同的操作意向对应不同的分发服务器12，以根据用户的操作意向选择相应的分发服务器12，在该分发服务器12中进行数据获取，以避免大量的无效搜索，提高搜索效率。其中，该分发服务器12与预设查询引擎13连接，用于将拆分后的搜索数据分配至相应的预设查询引擎13进行查询，并接收与搜索数据相匹配的查询结果，查询结果可以是一句话或一段文本，例如，搜索数据是若干个关键词，则查询结果可以是包括至少一个关键词的文本。Therefore, different operation intentions correspond todifferent distribution servers 12, so as to select thecorresponding distribution server 12 according to the user's operation intention, and perform data acquisition in thedistribution server 12 to avoid a large number of invalid searches and improve search efficiency. Thedistribution server 12 is connected to thepreset query engine 13, and is used to distribute the split search data to the correspondingpreset query engine 13 for query, and to receive query results matching the search data. The query results can be A sentence or a piece of text, for example, the search data is several keywords, the query result may be a text including at least one keyword.

可通过消息中间件将各个子搜索数据分发给预设查询引擎13，以供所述预设查询引擎13从预设数据库中获取所述子搜索数据对应的目标数据。其中，预设查询引擎13可以存在多个，当预设查询引擎13存在多个时，可以通过均分的方式将子搜索数据发给不同的预设查询引擎13，各个所述预设查询引擎13从预设数据库中获取所述子搜索数据对应的目标数据，将目标数据发送给分发服务器12。Each sub-search data can be distributed to thepreset query engine 13 through the message middleware, so that thepreset query engine 13 can obtain target data corresponding to the sub-search data from the preset database. Wherein, there may be multiplepreset query engines 13, and when there are multiplepreset query engines 13, the sub-search data may be sent to differentpreset query engines 13 in an even way, and each of thepreset query engines 13 Acquire the target data corresponding to the sub-search data from the preset database, and send the target data to thedistribution server 12 .

在本实施例中，预设查询引擎13直接和预设数据库进行交互，负责数据的存取。在预设查询引擎13进行数据的存取过程中，首先将子搜索数据进行数据转换，并提交给查询执行器，查询执行器多路并发地向预设数据库对应各个集群发起查询，当其中一路集群查询成功后，结束其他各路集群查询并返回目标数据。In this embodiment, thepreset query engine 13 directly interacts with the preset database and is responsible for data access. In the process of data access by thepreset query engine 13, the sub-search data is first converted into data and submitted to the query executor. The query executor concurrently initiates queries to each cluster corresponding to the preset database. After the cluster query is successful, end other cluster queries and return the target data.

分发服务器12接收各个预设查询引擎13发送的目标数据，将目标数据进行合并后，生成查询结果，将所述查询结果发送给所述中转服务器11。Thedistribution server 12 receives the target data sent by eachpreset query engine 13 , combines the target data, generates a query result, and sends the query result to therelay server 11 .

本实施例通过预设消息中间件的设置，实现为分发服务器12和查询引擎提供高吞吐量、高可靠的消息通信通道，以满足不断增长的数据查询需求。In this embodiment, by presetting the settings of the message middleware, a high-throughput and high-reliability message communication channel is provided for thedistribution server 12 and the query engine to meet the ever-increasing demand for data query.

分发服务器12在生成查询结果之后，将查询结果进行压缩后发送给中转服务器11，本实施例的中转服务器11接收所述分发服务器12发送的查询结果，并对查询结果进行解压后，提取其中的数据，然后对所述查询结果的所有字段进行敏感词过滤，并将含有敏感词的字段替换为无法识别的文本，如乱码，最后得到目标查询结果，最后将目标查询结果发送给用户，并在用户所在的终端以列表的形式显示目标查询结果。After thedistribution server 12 generates the query result, it compresses the query result and sends it to therelay server 11. Therelay server 11 in this embodiment receives the query result sent by thedistribution server 12, decompresses the query result, and extracts the Then, filter all fields of the query result with sensitive words, and replace the fields containing sensitive words with unrecognized text, such as garbled characters, and finally obtain the target query result, and finally send the target query result to the user, and in the The terminal where the user is located displays the target query results in the form of a list.

优选地，所述用户的操作意向包括对业务数据的查询，各个所述预设查询引擎13配置有自然语言处理模型；其中，Preferably, the user's operation intention includes querying business data, and each of thepreset query engines 13 is configured with a natural language processing model; wherein,

各个所述预设查询引擎13还用于对所述子搜索数据进行特征提取，并转换为特征向量，将所述特征向量输入所述自然语言处理模型中，得到目标数据；其中，所述自然语言处理模型为预先训练好的神经网络模型，用于根据所述特征向量从预设数据库中获取目标数据。Each of thepreset query engines 13 is also used to perform feature extraction on the sub-search data, convert it into a feature vector, and input the feature vector into the natural language processing model to obtain target data; The language processing model is a pre-trained neural network model for acquiring target data from a preset database according to the feature vector.

如上所述，可以理解地，本申请中提出的所述数据搜索系统的各组成部分可以实现如上所述数据搜索方法任一项的功能，具体结构不再赘述。As described above, it can be understood that each component of the data search system proposed in this application can implement the functions of any one of the above data search methods, and the specific structure will not be repeated.

本申请一实施例还提供一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现一种数据搜索方法，应用于数据搜索系统，所述数据搜索系统包括中转服务器、分发服务器及预设查询引擎，所述数据搜索方法包括步骤：An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements a data search method, which is applied to a data search system, where the data search system includes a transit server , a distribution server and a preset query engine, the data search method comprises the steps:

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限，RAM以多种形式可得，诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. When the computer program is executed, it may include the flow of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

综上所述，本申请的最大有益效果在于：To sum up, the greatest beneficial effects of the present application are:

本申请所提供的一种数据搜索方法、系统及存储介质，通过中转服务器获取用户的搜索请求，根据搜索请求确定用户的操作意向，根据操作意向查询搜索请求所需使用的分发服务器，然后从搜索请求提取搜索数据，将搜索数据发送至操作意向对应的分发服务器，分发服务器对搜索数据进行拆分，得到多个子搜索数据，并将各个子搜索数据分发给不同的预设查询引擎，预设查询引擎从预设数据库中获取子搜索数据对应的目标数据；分发服务器对各目标数据合并后生成查询结果，中转服务器对查询结果的所有字段进行敏感词过滤，并将含有敏感词的字段替换为无法识别的目标文本，生成目标查询结果，确保数据的安全性；最后将所述目标查询结果发送给用户。本申请根据用户的搜索请求确定用户的操作意向，基于不同的操作意向选择相应的分发服务器，以快速找到所需的数据，避免大量的无效搜索，提高了数据查询效率，同时支持不同的查询方式；此外，通过将各个子搜索数据分发给不同的预设查询引擎，以通过多个预设查询引擎同时从预设数据库中获取子搜索数据对应的目标数据，实现了同步查询，进一步提高了查询效率。In a data search method, system and storage medium provided by the present application, a user's search request is obtained through a relay server, the user's operation intention is determined according to the search request, the distribution server required for the search request is queried according to the operation intention, and then the search request is obtained from the search request. Request to extract search data, send the search data to the distribution server corresponding to the operation intention, the distribution server splits the search data, obtains multiple sub-search data, and distributes each sub-search data to different preset query engines, preset query The engine obtains the target data corresponding to the sub-search data from the preset database; the distribution server merges the target data to generate query results, and the relay server filters all fields of the query results for sensitive words, and replaces the fields containing sensitive words with the ones that cannot. The identified target text is generated to generate target query results to ensure data security; finally, the target query results are sent to the user. This application determines the user's operation intention according to the user's search request, and selects the corresponding distribution server based on different operation intentions, so as to quickly find the required data, avoid a large number of invalid searches, improve the data query efficiency, and support different query methods. In addition, by distributing each sub-search data to different preset query engines, the target data corresponding to the sub-search data can be simultaneously obtained from the preset database through multiple preset query engines, thereby realizing synchronous query and further improving the query performance. efficiency.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, device, article or method comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, apparatus, article or method. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, apparatus, article, or method that includes the element.

以上所述仅为本申请的优选实施例，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the present application, and are not intended to limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied to other related The technical field is similarly included in the scope of patent protection of this application.

Claims

Translated fromChinese

1.一种数据搜索方法，应用于数据搜索系统，所述数据搜索系统包括中转服务器、分发服务器及预设查询引擎，其特征在于，所述数据搜索方法包括：1. a data search method, applied to a data search system, the data search system comprising a relay server, a distribution server and a preset query engine, it is characterized in that, the data search method comprises:

2.根据权利要求1所述的方法，其特征在于，所述用户的操作意向包括对业务数据的查询，各个所述预设查询引擎配置有自然语言处理模型，所述各个所述预设查询引擎基于接收到的子搜索数据，从预设数据库中获取所述子搜索数据对应的目标数据，包括：2 . The method according to claim 1 , wherein the operation intention of the user includes a query for business data, each of the preset query engines is configured with a natural language processing model, and each of the preset queries is configured with a natural language processing model. 3 . Based on the received sub-search data, the engine obtains the target data corresponding to the sub-search data from the preset database, including:

3.根据权利要求2所述的方法，其特征在于，所述子搜索数据为文本，所述各个所述预设查询引擎对所述子搜索数据进行特征提取，包括：3. The method according to claim 2, wherein the sub-search data is text, and each of the preset query engines performs feature extraction on the sub-search data, comprising:

4.根据权利要求2所述的方法，其特征在于，所述将所述特征向量输入所述自然语言处理模型中之前，还包括：4. The method according to claim 2, wherein before the inputting the feature vector into the natural language processing model, the method further comprises:

5.根据权利要求1所述的方法，其特征在于，所述操作意向对应的分发服务器包括自助查询SQL服务，所述将所述搜索数据发送至所述操作意向对应的分发服务器，包括：5. The method according to claim 1, wherein the distribution server corresponding to the operation intention comprises a self-service query SQL service, and the sending the search data to the distribution server corresponding to the operation intention comprises:

6.根据权利要求1所述的方法，其特征在于，所述将所述目标查询结果发送给所述用户，包括：6. The method according to claim 1, wherein the sending the target query result to the user comprises:

7.根据权利要求1所述的方法，其特征在于，所述将所述目标查询结果发送给所述用户，包括：7. The method according to claim 1, wherein the sending the target query result to the user comprises:

8.一种数据搜索系统，其特征在于，包括：8. A data search system, characterized in that, comprising:

9.根据权利要求8所述的数据搜索系统，其特征在于，所述用户的操作意向包括对业务数据的查询，各个所述预设查询引擎配置有自然语言处理模型；其中，9 . The data search system according to claim 8 , wherein the user's operation intention includes a query for business data, and each of the preset query engines is configured with a natural language processing model; wherein,

10.一种计算机可读存储介质，其特征在于，所述计算机可读存储介质上存储有计算机程序，该计算机程序被处理器执行时实现权利要求1-7任一项所述的数据搜索方法。10. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the data search method according to any one of claims 1-7 is implemented .