CN114547104A

Movatterモバイル変換

Info

Publication number: CN114547104A
Application number: CN202210151017.1A
Authority: CN
Inventors: 戴文鹏
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2022-05-27

Abstract

The application belongs to the technical field of data query and provides a log data query method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of carrying out statistical analysis on target log data to obtain statistical data, classifying the statistical data according to data types, storing the statistical data of different data types into data warehouses of different search engines, obtaining an original query strategy of each search engine, adjusting the query strategy of each search engine according to the corresponding data type to obtain a target query strategy, configuring the target query strategy into the corresponding search engine, receiving a data query request of a user, determining the target search engine according to the data query request, querying according to the target query strategy of the target search engine to obtain the statistical data, and sending the statistical data to the user, so that when the statistical data need to be queried, the user does not need to traverse the data warehouses of all the search engines, the user can conveniently and quickly search the statistical data of different data types, and the query efficiency of the data is improved.

Description

Log data query method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of data query technologies, and in particular, to a log data query method, apparatus, computer device, and storage medium.

Background

With the rapid development of the internet, log data in the internet is increased rapidly, so that the data volume stored by the service platform is more and more, and the storage pressure faced by the service platform is also more and more.

In the prior art, after log data is stored in the data warehouses of the service platform, because the query strategies adopted by the search engines corresponding to each data warehouse are the same, when the log data of a specific data type needs to be queried, the data warehouses of all the search engines need to be traversed, and the query efficiency is low.

Disclosure of Invention

The application mainly aims to provide a log data query method, a log data query device, a computer device and a storage medium, so as to improve query efficiency.

In order to achieve the above object, the present application provides a log data query method, which includes:

receiving log data from different data sources;

preprocessing the log data to obtain target log data;

performing statistical analysis on the target log data to obtain statistical data;

classifying the statistical data according to data types;

storing statistical data of different data types into data warehouses of different search engines, and acquiring the original query strategy of each search engine;

adjusting the query strategy of each search engine according to the corresponding data type to obtain a target query strategy of each search engine;

configuring a target query strategy of each search engine into the corresponding search engine;

receiving a data query request of a user, determining a target search engine according to the data query request, and querying according to a target query strategy of the target search engine to obtain statistical data and obtain target data;

and sending the target data to the user.

Further, before the adjusting the query policy of each search engine according to the corresponding data type to obtain the target query policy of each search engine, the method further includes:

acquiring a query range and a query sequence specified in the query strategy of each search engine;

determining the storage position of the statistical data of each data type in a data warehouse according to the data types;

and correspondingly adjusting the query range and the query sequence specified in the query strategy of each search engine according to the storage position of each data type so as to query the statistical data of the storage position of the corresponding data type preferentially.

Preferably, the determining a target search engine according to the data query request, and querying according to a target query policy of the target search engine to obtain statistical data to obtain target data includes:

extracting query information carried by the data query request;

determining a search engine corresponding to the data query request according to the query information to obtain a target search engine;

and sending the data query request to the target search engine, and receiving statistical data obtained by the target search engine through querying from the data warehouse according to a target query strategy to obtain target data.

Preferably, the receiving log data from different data sources includes:

acquiring log data scattered in a plurality of data sources in a log mode, an SDK mode or an MQ mode; or

Extracting log data pre-stored to a data table from the data table of a database.

Preferably, the preprocessing the log data to obtain target log data includes:

cleaning the log data;

and sending the cleaned log data to a Kafka cluster according to the category to obtain target log data of a preset field.

Preferably, the preprocessing the log data to obtain target log data includes:

acquiring the generation time of the log data;

judging whether the generation time exceeds preset generation time or not;

and if so, deleting the log data exceeding the preset generation time, and taking the residual log data as target log data.

Preferably, the performing statistical analysis on the target log data to obtain statistical data includes:

setting a plurality of analysis dimensions;

splitting the target log data according to the plurality of analysis dimensions to obtain the target log data corresponding to each analysis dimension;

and respectively carrying out statistical analysis on the target log data corresponding to each analysis dimension to obtain statistical data corresponding to each analysis dimension.

The present application further provides a log data query device, which includes:

the receiving module is used for receiving log data from different data sources;

the preprocessing module is used for preprocessing the log data to obtain target log data;

the statistical analysis module is used for performing statistical analysis on the target log data to obtain statistical data;

the classification module is used for classifying the statistical data according to data types;

the storage module is used for storing the statistical data of different data types into data warehouses of different search engines and acquiring the original query strategy of each search engine;

the adjusting module is used for adjusting the query strategy of each search engine according to the corresponding data type to obtain a target query strategy of each search engine;

the configuration module is used for configuring the target query strategy of each search engine into the corresponding search engine;

the query module is used for receiving a data query request of a user, determining a target search engine according to the data query request, and querying according to a target query strategy of the target search engine to obtain statistical data and obtain target data;

and the sending module is used for sending the target data to the user.

The present application further provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.

According to the log data query method, the log data query device, the computer equipment and the storage medium, log data from different data sources are received, the log data are preprocessed to obtain target log data, and the target log data are subjected to statistical analysis to obtain statistical data; classifying the statistical data according to data types, storing the statistical data of different data types into data warehouses of different search engines, acquiring an original query strategy of each search engine, adjusting the query strategy of each search engine according to the corresponding data type to acquire a target query strategy of each search engine, configuring the target query strategy of each search engine into the corresponding search engine, receiving a data query request of a user, determining the target search engine according to the data query request, querying according to the target query strategy of the target search engine to acquire the statistical data to acquire target data, sending the target data to the user, storing the statistical data of different data types into the different search engines, adjusting the query strategies according to the corresponding data types to acquire the target query strategy, configuring the target query strategy into the search engines, when the statistical data needs to be queried, the data warehouse of all the search engines does not need to be traversed, so that the user can quickly search the statistical data of different data types, and the data query efficiency is improved.

Drawings

Fig. 1 is a schematic flowchart of a log data query method according to an embodiment of the present application;

FIG. 2 is a block diagram illustrating a log data query apparatus according to an embodiment of the present disclosure;

fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The log data query method provided by the application takes a server as an execution main body, wherein the server can be an independent server, and can also be a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and an artificial intelligence platform.

The log data query method is used for solving the technical problems that query strategies adopted by search engines corresponding to each current data warehouse are the same, so that when log data of a specific data type needs to be queried, the data warehouses of all the search engines need to be traversed, and query efficiency is low. Referring to fig. 1, in one embodiment, the log data query method includes:

s11, receiving log data from different data sources;

s12, preprocessing the log data to obtain target log data;

s13, carrying out statistical analysis on the target log data to obtain statistical data;

s14, classifying the statistical data according to data types;

s15, storing the statistical data of different data types into data warehouses of different search engines, and acquiring the original query strategy of each search engine;

s16, adjusting the query strategy of each search engine according to the corresponding data type to obtain a target query strategy of each search engine;

s17, configuring the target query strategy of each search engine into the corresponding search engine;

s18, receiving a data query request of a user, determining a target search engine according to the data query request, and querying according to a target query strategy of the target search engine to obtain statistical data to obtain target data;

and S19, sending the target data to the user.

The embodiment can synchronize the relevant data tables needing to be queried into the data warehouse of the Flink through the kafka cluster before receiving the log data from different data sources, wherein the data tables are used for recording the log data of users.

Specifically, the present embodiment receives log data from different data sources, where the log data is procedural event record data generated by the recording system, and may include a web page browsed by the user, a video complied with, or a login interface. By looking at the log data, it can be known which user is specific, at what time, on which device or in what application system, what specific operation was done.

In this embodiment, the log data is preprocessed, for example, meaningless or format error data in the log data is cleaned, so as to obtain target log data, and then Flink is called to perform statistical analysis on the target log data, so as to obtain statistical data, for example, browsing amount, praise amount, and the like of a statistical user are obtained.

Then, the statistical data are classified according to the data types, the statistical data of different data types are stored in different search engines, the original query strategy of the search engines is obtained, the query strategy is adjusted according to the corresponding data types to obtain a target query strategy, and the target query strategy is configured in the search engines, so that the original query strategy is adjusted, for example, the logic of directly reading data from Oracle is changed into the logic of reading information of an ES search engine, and the calculation efficiency is greatly improved.

When a data query request of a user is received, the present embodiment determines a target search engine to be invoked according to the data query request, for example, a corresponding target search engine is queried according to a data type carried in the data query request, so as to quickly determine the target search engine to be invoked, obtain statistical data according to a target query policy of the target search engine, obtain target data, and send the target data to the user, thereby improving data query efficiency.

Wherein the data type includes any one of: keywords, functions, and metadata; the metadata comprises tables, fields and libraries; the keyword is a specially defined identifier in computer language, and is sometimes called a reserved word. A function refers to a piece of program or code, also called a subroutine, that can be directly referenced by another piece of program or another piece of code. Metadata, also called intermediate data and relay data, is data describing data, mainly information describing data attributes, and is used to support functions such as indicating storage locations, history data, resource search, file recording, and the like. The library is a database, and the database is a warehouse for storing data. A table is an object used to store data in a database, is a collection of structured data, and is a basic element of the database. A field represents a member that represents a variable associated with an object or class. In a database, the columns of a table are typically referred to as fields, each field matching information for a topic. For example, names and contact numbers in the address book database are common attributes for all rows in the table, so these columns are referred to as name fields and contact number fields.

The Flink is an efficient distributed log data storage platform based on memory computing, and is one of top-level items of Apache. The core of the system is a Streaming data stream engine (Streaming data stream engine), which provides distributed data distribution, communication and fault tolerance functions of data streams, has the characteristics of high efficiency, reliability, expandability and the like, and has good compatibility with a Hadoop ecosystem. Flink uses DataSet to describe the data sets for parallel computation and provides rich log data storage interfaces such as map, reduce, join, group for the corresponding data sets.

The Flink of the embodiment adopts streaming log data storage, so that the log data storage efficiency is greatly improved, the fault tolerance is high, and a real-time data calculation scene can be met.

In one embodiment, to further ensure the privacy and security of the log data, the log data may also be stored in a node of a blockchain. The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. The block chain is essentially a decentralized database, and is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The log data query method provided by the application comprises the steps of receiving log data from different data sources, preprocessing the log data to obtain target log data, and performing statistical analysis on the target log data to obtain statistical data; classifying the statistical data according to data types, storing the statistical data of different data types into data warehouses of different search engines, acquiring an original query strategy of each search engine, adjusting the query strategy of each search engine according to the corresponding data type to acquire a target query strategy of each search engine, configuring the target query strategy of each search engine into the corresponding search engine, receiving a data query request of a user, determining the target search engine according to the data query request, querying according to the target query strategy of the target search engine to acquire the statistical data to acquire target data, sending the target data to the user, storing the statistical data of different data types into the different search engines, adjusting the query strategies according to the corresponding data types to acquire the target query strategy, configuring the target query strategy into the search engines, when the statistical data needs to be queried, the data warehouse of all the search engines does not need to be traversed, so that the user can quickly search the statistical data of different data types, and the data query efficiency is improved.

In an embodiment, before the adjusting the query policy of each search engine according to the corresponding data type to obtain the target query policy of each search engine, the method may further include:

In this embodiment, the query policy defines a query range and a query sequence, such as which data warehouses need to be queried, in which sequence a plurality of data warehouses need to be queried, or which data tables in the data warehouses need to be queried, in which sequence a plurality of data tables need to be queried. Each data warehouse can record statistical data of one data type, so that when the statistical data of a certain data type needs to be inquired, a target search engine needing to be called is determined first, and a data table or a data warehouse needing to be inquired first is further determined, so that the data inquiry efficiency is improved.

In this embodiment, the storage location of the statistical data of each data type in the data warehouse is determined according to the data type, and the query range and the query sequence specified in the query policy of each search engine are correspondingly adjusted according to the storage location of each data type, so as to query the statistical data of the storage location of the corresponding data type with priority. For example, when the statistical data of data type a is stored in data table a in data warehouse 1, then the query policy of the search engine is adjusted, and data warehouse 1 and its data table a may be queried preferentially.

In an embodiment, the determining a target search engine according to the data query request, and querying according to a target query policy of the target search engine to obtain statistical data to obtain target data may specifically include:

extracting query information carried by the data query request;

In this embodiment, the query information includes a data type, and a corresponding search engine is found based on the data type of the query information to obtain a target search engine, in which a comparison table may be pre-constructed, and the data type of statistical data stored in a data warehouse of each search engine is recorded in the comparison table, so that the target search engine is quickly found based on the comparison table, the target search engine is called to query the corresponding data warehouse thereof to obtain the statistical data, and the target data is obtained and returned to the user, so as to avoid traversing all data warehouses and affecting the query efficiency of the data.

In an embodiment, the receiving log data from different data sources may specifically include:

In the embodiment, the kafka cluster can be used for receiving log data of users from different data sources and storing the received data from the different data sources; in addition, the kafka cluster sends the received original event data to the data structured cleaning module, the data structured cleaning module acquires configuration information which is cached in the Redis system in advance during cleaning, cleans log data according to the configuration information, generates structured target log data, and sends the target log data to the kafka cluster for storage.

In addition, an Apache flash log acquisition tool can be used for acquiring log data in real time, and corresponding calculation and processing can be carried out according to the acquired log data. The log collection tool is adopted to collect logs in real time, and the tool is improved based on Apache flash, can support log discovery, log aggregation and configure practical functions such as heating. And log collection can be started/stopped at any time. And directly producing the collected log messages into the kafka cluster, wherein the kafka cluster has high throughput, high reliability and high availability.

The log mode is that a log data acquisition unit is used for reading newly added contents of a specified log file in real time and sending the newly added contents to a log collection module, and the obtained log data are cleaned by the log collection module and then sent to a Kafka cluster; the SDK mode is that the Agent uploading data is embedded into a support application or a container to be used as a data source, the Agent uploads the data to a background service, and the data enters a Kafka cluster after being processed by the background service; or directly sending the log data to the Kafka cluster by the Agent to serve as a data source; the MQ mode is that the Kafka message queue is supported to serve as a data source, and log data are directly sent to the Kafka cluster.

The Kafka cluster is a distributed message system supporting partition storage and multiple copies, and can effectively solve the problem of log data storage after an agent is down by adopting a publish/subscribe message processing mode. The Kafka cluster operates in a cluster mode and is formed by a plurality of brokers together. The producer sends the message to a specific topic, which is then consumed by the consumers subscribing to the topic in poll. The Kafka module writes to the disk in a sequential write mode and thus at a much faster rate than writes to the disk randomly.

In an embodiment, the preprocessing the log data to obtain target log data includes:

cleaning the log data;

Specifically, Kafka clusters are Topic-oriented, and the external source data is mainly data with a large amount and complex data types and field types, or raw data of different source databases, and if the external source data is directly pushed to developers, unexpected negligence caused by operations of the developers inevitably occurs, so that quick retrieval of a later interface layer is influenced, or the effect of projects is influenced because the databases maintain indexes slowly. Therefore, the external source data is cleaned, redundant fields, unreasonable fields or garbage data which is not cleaned completely in the external source data are removed, and then the external source data is uniformly integrated and divided into the topics of the Kafka cluster, wherein each category of data corresponds to a specific Topic.

The Kafka cluster can batch submit messages or compress messages, so that for a message generator, performance expenditure is hardly felt, and only preliminary unified data (including unified field naming, unified structure and unified database storage format) is needed to ensure that the data cannot be lost.

acquiring the generation time of the log data;

judging whether the generation time exceeds preset generation time or not;

The embodiment can acquire the generation time of the log data, wherein the generation time can be acquired by a log acquisition tool when the log data is generated and recorded, and the generation time is stored together with the log data.

And then comparing the generation time with preset generation time to judge whether the generation time exceeds the preset generation time, if so, indicating that the log data exceeds the period, deleting the log data exceeding the preset generation time, and taking the residual log data as target log data to screen and obtain the log data with a newer date. The preset generation time can be set in a self-defined mode, for example, the preset generation time is set to be one week, and log data which exceeds one week are deleted to ensure timeliness of the data.

In an embodiment, the performing a statistical analysis on the target log data to obtain statistical data may specifically include:

setting a plurality of analysis dimensions;

The analysis dimensionalities are multiple angles for analyzing the log data, multiple analysis dimensionalities can be preset, the target log data are split according to the multiple analysis dimensionalities, target log data corresponding to each analysis dimensionality are obtained, statistical analysis is conducted on the target log data corresponding to each analysis dimensionality, and statistical data corresponding to each analysis dimensionality are obtained. For example, when the log data contains data of web pages browsed by the user, videos complied by the user, music played, and the like, the analysis dimension may be a statistical analysis of the web page browsing amount of the user, a statistical analysis of the video compliment amount, and a statistical analysis of the music type and number of music played, and finally, statistical data corresponding to each analysis dimension is obtained to know the preference of the user from multiple dimensions.

Referring to fig. 2, an embodiment of the present application further provides a log data query apparatus, including:

a receivingmodule 11, configured to receive log data from different data sources;

thepreprocessing module 12 is configured to preprocess the log data to obtain target log data;

astatistical analysis module 13, configured to perform statistical analysis on the target log data to obtain statistical data;

aclassification module 14, configured to classify the statistical data according to data types;

thestorage module 15 is configured to store statistical data of different data types in data warehouses of different search engines, and obtain an original query strategy of each search engine;

an adjustingmodule 16, configured to adjust the query policy of each search engine according to a corresponding data type, to obtain a target query policy of each search engine;

aconfiguration module 17, configured to configure the target query policy of each search engine into the corresponding search engine;

thequery module 18 is configured to receive a data query request of a user, determine a target search engine according to the data query request, and query according to a target query policy of the target search engine to obtain statistical data to obtain target data;

a sendingmodule 19, configured to send the target data to the user.

Wherein the data type includes any one of: keywords, functions, and metadata; the metadata comprises tables, fields and libraries; the keyword is a specially defined identifier in computer language, and is sometimes called a reserved word. A function refers to a program or code, also called a subroutine, that can be directly referenced by another program or code. Metadata, also called intermediate data and relay data, is data describing data, mainly information describing data attributes, and is used to support functions such as indicating storage locations, history data, resource lookup, file records, and the like. The database refers to a database, and the database is a warehouse for storing data. A table is an object used to store data in a database, is a collection of structured data, and is a basic element of the database. A field represents a member that represents a variable associated with an object or class. In a database, the columns of a table are typically referred to as fields, each field matching information for a topic. For example, names and contact numbers in the address book database are common attributes to all rows in the table, so these columns are referred to as the name field and the contact number field.

The Flink is an efficient distributed log data storage platform based on memory computing, and is one of top-level items of Apache. The core of the system is a Streaming data stream engine (Streaming data stream engine), which provides distributed data distribution, communication and fault tolerance functions of data streams, has the characteristics of high efficiency, reliability, expandability and the like, and has good compatibility with a Hadoop ecosystem. Flink uses DataSet to describe the datasets of parallel computing and provides rich log data storage interfaces such as map, reduce, join, group for the corresponding datasets.

As described above, it can be understood that each component of the log data query apparatus provided in the present application may implement the function of any one of the log data query methods described above, and the detailed structure is not described again.

Referring to fig. 3, an embodiment of the present application further provides a computer device, and an internal structure of the computer device may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a storage medium and an internal memory. The storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and computer programs in the storage medium. The database of the computer device is used for storing the related data of the log data query method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a log data query method.

The processor executes the log data query method, and the method comprises the following steps:

receiving log data from different data sources;

preprocessing the log data to obtain target log data;

classifying the statistical data according to data types;

and sending the target data to the user.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements a log data query method, including the steps of:

receiving log data from different data sources;

preprocessing the log data to obtain target log data;

classifying the statistical data according to data types;

and sending the target data to the user.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, and the computer program may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

To sum up, the most beneficial effect of this application lies in:

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A log data query method is characterized by comprising the following steps:

receiving log data from different data sources;

preprocessing the log data to obtain target log data;

classifying the statistical data according to data types;

and sending the target data to the user.

2. The method of claim 1, wherein before adjusting the query policy of each search engine according to the corresponding data type to obtain the target query policy of each search engine, the method further comprises:

3. The method of claim 1, wherein determining a target search engine according to the data query request, and querying for statistical data according to a target query policy of the target search engine to obtain target data comprises:

extracting query information carried by the data query request;

4. The method of claim 1, wherein receiving log data from different data sources comprises:

5. The method of claim 1, wherein the pre-processing the log data to obtain target log data comprises:

cleaning the log data;

6. The method of claim 1, wherein the pre-processing the log data to obtain target log data comprises:

acquiring the generation time of the log data;

judging whether the generation time exceeds preset generation time or not;

7. The method of claim 1, wherein the performing a statistical analysis on the target log data to obtain statistical data comprises:

setting a plurality of analysis dimensions;

8. An apparatus for querying log data, comprising:

and the sending module is used for sending the target data to the user.

9. A computer device, comprising:

a processor;

a memory;

wherein the memory stores a computer program which, when executed by the processor, implements the log data query method of any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the log data query method of any one of claims 1 to 7.