CN111666490B

Movatterモバイル変換

Info

Publication number: CN111666490B
Application number: CN202010350127.1A
Authority: CN
Inventors: 李强
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2024-11-01
Anticipated expiration: 2040-04-28
Also published as: CN111666490A

Abstract

Translated fromChinese

本申请实施例属于数据分析领域，涉及一种基于kafka的信息推送方法、装置、计算机设备及存储介质，其中方法包括：获依据预设的用户操作事件和动态筛选条件属性，生成群体类别；构建埋点事件，并通过埋点事件收集用户的用户数据，将用户数据存储于分布式搜索引擎ElasticSearch中；通过配置分布式发布订阅消息系统kafka和采用分布式计算引擎Spark对用户数据的处理，得到每个用户对应的群体类别，根据业务信息，从群体类别中，确定待推荐的群体类别，并向待推荐的群体类别包含的用户推送业务信息。本申请实现针对不同群体类别的用户，实现精准推送，提高业务信息推送效率。本申请还涉及区块链技术，所述用户数据可存储于区块链节点中。

The embodiments of the present application belong to the field of data analysis, and relate to a method, device, computer equipment and storage medium for information push based on kafka, wherein the method includes: obtaining a group category based on preset user operation events and dynamic screening condition attributes; constructing a buried event, and collecting user data of the user through the buried event, and storing the user data in the distributed search engine ElasticSearch; by configuring the distributed publish-subscribe message system kafka and using the distributed computing engine Spark to process the user data, the group category corresponding to each user is obtained, and according to the business information, the group category to be recommended is determined from the group category, and the business information is pushed to the users included in the group category to be recommended. The present application realizes accurate push for users of different group categories and improves the efficiency of business information push. The present application also relates to blockchain technology, and the user data can be stored in a blockchain node.

Description

Translated fromChinese

基于kafka的信息推送方法、装置、设备及存储介质Information push method, device, equipment and storage medium based on Kafka

技术领域Technical Field

本申请涉及数据分析技术领域，尤其涉及基于kafka的信息推送方法、装置、设备及存储介质。The present application relates to the field of data analysis technology, and in particular to a Kafka-based information push method, device, equipment and storage medium.

背景技术Background Art

信息推荐是一种用于根据用户偏好，向用户推送用户关注的数据信息的方式，精准的信息推荐，一方面，有利于数据提供方想用户推送自己的数据信息，另一方面，也有利于的用户获取到自己想要的信息。近年来推荐系统越来越火，并用于各种不同的领域，包括电影、音乐、新闻、书籍等等。如电商平台也都有他们自己专门的推荐系统为客户们提供他们可能喜欢的产品。在合理的设置下，它可以有效的提升利润、点击率、转化率等等，为用户提供更好的体验。Information recommendation is a method for pushing data information that users are interested in to users based on their preferences. Accurate information recommendation is beneficial for data providers to push their own data information to users on the one hand, and for users to obtain the information they want on the other hand. In recent years, recommendation systems have become more and more popular and are used in various fields, including movies, music, news, books, etc. For example, e-commerce platforms also have their own dedicated recommendation systems to provide customers with products they may like. Under reasonable settings, it can effectively increase profits, click-through rates, conversion rates, etc., and provide users with a better experience.

现有的信息推荐系统，通过设置埋点的方法，收集用户数据，通过对用户数据进行分析，得出用户的偏好信息，并根据用户的偏好信息将推送信息推送给用户。但是，其收集的用户数据是根据全部用户信息进行分析，并向全部用户进行推送，这种方式的信息推送，精准程度较低，推荐的信息并非是用户实际需要的，使得推荐信息的转化率不高，从而导致信息推荐效率较低。The existing information recommendation system collects user data by setting up tracking points, obtains user preference information by analyzing the user data, and pushes the push information to the user based on the user preference information. However, the user data collected is analyzed based on all user information and pushed to all users. This method of information push has a low accuracy, and the recommended information is not what the user actually needs, resulting in a low conversion rate of the recommended information, which leads to low efficiency of information recommendation.

发明内容Summary of the invention

本申请实施例的目的在于提出一种基于kafka的信息推送方法，以提高信息推荐的效率。The purpose of the embodiment of the present application is to propose an information push method based on Kafka to improve the efficiency of information recommendation.

为了解决上述技术问题，本申请实施例提供一种基于kafka的信息推送方法，包括：In order to solve the above technical problems, the embodiment of the present application provides a method for pushing information based on Kafka, including:

依据预设的用户操作事件和动态筛选条件属性，生成群体类别；Generate group categories based on preset user operation events and dynamic screening condition attributes;

构建埋点事件，并通过所述埋点事件收集用户的用户数据，将所述用户数据存储于分布式搜索引擎ElasticSearch中；Construct a tracking event, collect user data of the user through the tracking event, and store the user data in the distributed search engine ElasticSearch;

配置分布式发布订阅消息系统kafka中主题topic的参数信息，并基于配置好的主题topic，对存储于所述分布式搜索引擎ElasticSearch中的所述用户数据进行数据分析，将得到的分析结果存储到消息中间件中；Configure the parameter information of the topic in the distributed publish-subscribe messaging system kafka, and based on the configured topic, perform data analysis on the user data stored in the distributed search engine ElasticSearch, and store the obtained analysis results in the messaging middleware;

采用分布式计算引擎Spark，从所述消息中间件中获取所述分析结果，并对所述分析结果进行过滤分析，将得到的过滤分析结果回写到所述消息中间件中；Using the distributed computing engine Spark, the analysis results are obtained from the message middleware, and the analysis results are filtered and analyzed, and the obtained filtered analysis results are written back to the message middleware;

将所述消息中间件中的所述过滤分析结果,写入到所述分布式发布订阅消息系统kafka中，通过所述分布式发布订阅消息系统kafka对所述过滤分析结果进行解析处理，得到每个所述用户对应的群体类别；The filtering analysis results in the message middleware are written into the distributed publish-subscribe message system Kafka, and the filtering analysis results are parsed and processed by the distributed publish-subscribe message system Kafka to obtain the group category corresponding to each of the users;

根据业务信息，从所述群体类别中，确定待推荐的群体类别，并向所述待推荐的群体类别包含的用户推送所述业务信息。According to the service information, a group category to be recommended is determined from the group categories, and the service information is pushed to users included in the group category to be recommended.

进一步的,所述配置分布式发布订阅消息系统kafka中主题topic的参数信息，并基于配置好的主题topic，对存储于所述分布式搜索引擎ElasticSearch中的所述用户数据进行数据分析，将得到的分析结果存储到消息中间件中包括：Further, the configuration of the parameter information of the topic in the distributed publish-subscribe messaging system kafka, and based on the configured topic, performing data analysis on the user data stored in the distributed search engine ElasticSearch, and storing the obtained analysis results in the message middleware includes:

配置分布式发布订阅消息系统kafka中所述主题topic的参数信息，并基于配置好的主题topic对所述用户数据进行数据分析，得到分析结果；Configure parameter information of the topic in the distributed publish-subscribe messaging system kafka, and perform data analysis on the user data based on the configured topic to obtain analysis results;

将所述分析结果封装成json字符串，得到分析结果json字符串，并将所述分析结果json字符串写入所述消息中间件。The analysis result is encapsulated into a json string to obtain an analysis result json string, and the analysis result json string is written into the message middleware.

进一步的，所述采用分布式计算引擎Spark，从所述消息中间件中获取所述分析结果，并对所述分析结果进行过滤分析，将得到的过滤分析结果回写到所述消息中间件中包括：Further, the adopting of the distributed computing engine Spark to obtain the analysis result from the message middleware, filtering and analyzing the analysis result, and writing the obtained filtering and analysis result back to the message middleware includes:

将所述分析结果封装成SQL命令；Encapsulating the analysis results into SQL commands;

通过所述分布式计算引擎Spark执行所述SQL命令，遍寻hive表制成的表，得到所述群体类别的用户数量和用户信息，并将所述用户数量和所述用户信息保存于所述消息中间件中。The SQL command is executed by the distributed computing engine Spark, and the table made of the hive table is searched to obtain the number of users and user information of the group category, and the number of users and the user information are stored in the message middleware.

进一步的,所述通过所述分布式计算引擎Spark执行所述SQL命令，遍寻hive表制成的表，得到述群体类别的用户数量和用户信息，并将所述用户数量和所述用户信息保存于所述消息中间件中包括：Further, executing the SQL command through the distributed computing engine Spark, searching the table made of the hive table, obtaining the number of users and user information of the group category, and saving the number of users and the user information in the message middleware includes:

通过所述SQL命令遍历所述hive库制成的表中的用户数据，过滤删除重复的所述用户数据，得到过滤结果；Traversing the user data in the table created by the hive library through the SQL command, filtering and deleting the duplicated user data, and obtaining a filtering result;

统计所述过滤结果的所述群体类别中的用户数量，并匹配所述用户对应信息，得到所述用户数量和所述用户信息，并将所述用户数量和所述用户信息保存与所述主题中间件中。The number of users in the group category of the filtering result is counted, and the user corresponding information is matched to obtain the number of users and the user information, and the number of users and the user information are saved in the subject middleware.

进一步的,所述采用分布式计算引擎Spark，从所述消息中间件中获取所述分析结果，并对所述分析结果进行过滤分析，将得到的过滤分析结果回写到所述消息中间件中还包括：Further, the adopting of the distributed computing engine Spark to obtain the analysis result from the message middleware, filtering and analyzing the analysis result, and writing the obtained filtering and analysis result back to the message middleware also includes:

将所述过滤分析结果封装成json字符串，得到过滤分析结果json字符串；Encapsulate the filtering analysis result into a json string to obtain a filtering analysis result json string;

将所述过滤分析结果json字符串回写到所述消息中间件。The filtering and analysis result json string is written back to the message middleware.

进一步的,在所述根据业务信息，从所述群体类别中，确定待推荐的群体类别，并向所述待推荐的群体类别包含的用户推送所述业务信息之后，所述方法还包括：Furthermore, after determining the group category to be recommended from the group category according to the service information, and pushing the service information to users included in the group category to be recommended, the method further includes:

通过调度器Quartz，定时的遍历所述群体类别，更新所述群体类别对应的所述用户。The group categories are traversed periodically through the scheduler Quartz, and the users corresponding to the group categories are updated.

根据所述群体类别的生成属性，为所述群体类别对应的用户赋予相应的标签。According to the generated attributes of the group category, a corresponding label is assigned to the user corresponding to the group category.

为了解决上述技术问题，本申请实施例提供一种基于kafka的信息推送装置，包括：In order to solve the above technical problems, the embodiment of the present application provides an information push device based on Kafka, including:

群体类别生成模块，用于依据预设的用户操作事件和动态筛选条件属性，生成群体类别；A group category generation module is used to generate group categories based on preset user operation events and dynamic screening condition attributes;

用户数据收集模块，用于构建埋点事件，并通过所述埋点事件收集用户的用户数据，将所述用户数据存储于分布式搜索引擎ElasticSearch中；A user data collection module is used to construct a point-of-view event, collect user data of users through the point-of-view event, and store the user data in a distributed search engine ElasticSearch;

用户数据分析模块，用于配置分布式发布订阅消息系统kafka中主题topic的参数信息，并基于配置好的主题topic，对存储于所述分布式搜索引擎ElasticSearch中的所述用户数据进行数据分析，将得到的分析结果存储到消息中间件中；The user data analysis module is used to configure the parameter information of the topic in the distributed publish-subscribe messaging system kafka, and based on the configured topic, perform data analysis on the user data stored in the distributed search engine ElasticSearch, and store the obtained analysis results in the message middleware;

过滤分析结果模块，用于采用分布式计算引擎Spark，从所述消息中间件中获取所述分析结果，并对所述分析结果进行过滤分析，将得到的过滤分析结果回写到所述消息中间件中；A filtering and analysis result module is used to obtain the analysis result from the message middleware by using the distributed computing engine Spark, and to filter and analyze the analysis result, and write the obtained filtering and analysis result back to the message middleware;

群体类别确定模块，用于将所述消息中间件中的所述过滤分析结果,写入到所述分布式发布订阅消息系统kafka中，通过所述分布式发布订阅消息系统kafka对所述过滤分析结果进行解析处理，得到每个所述用户对应的群体类别；A group category determination module is used to write the filtering analysis results in the message middleware into the distributed publish-subscribe message system Kafka, and parse the filtering analysis results through the distributed publish-subscribe message system Kafka to obtain the group category corresponding to each user;

业务信息推送模块，用于根据业务信息，从所述群体类别中，确定待推荐的群体类别，并向所述待推荐的群体类别包含的用户推送所述业务信息。The business information push module is used to determine the group category to be recommended from the group category according to the business information, and push the business information to users included in the group category to be recommended.

为解决上述技术问题，本发明采用的一个技术方案是：提供一种计算机设备，包括，一个或多个处理器；存储器，用于存储一个或多个程序，使得一个或多个处理器实现上述任意一项所述的基于kafka的信息推送方案。In order to solve the above technical problems, a technical solution adopted by the present invention is: to provide a computer device, including one or more processors; a memory for storing one or more programs, so that the one or more processors can implement any one of the above-mentioned Kafka-based information push solutions.

为解决上述技术问题，本发明采用的一个技术方案是：一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现上述任意一项所述的基于kafka的信息推送方案。In order to solve the above technical problems, a technical solution adopted by the present invention is: a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, any one of the above-mentioned Kafka-based information push solutions is implemented.

以上方案中的一种基于kafka的信息推送方法，通过依据预设的用户操作事件和动态筛选条件属性，生成群体类别，并构建埋点事件，并通过埋点事件收集用户的用户数据，有利于根据不同群体类别，获取不同的用户数据；通过分布式发布订阅消息系统kafka和分布式计算引擎Spark的数据分析，获取每个用户对应的群体类别用户用户，实现对用户的精准推送，提高信息推送的效率。In the above scheme, a kafka-based information push method generates group categories and constructs point-of-view events according to preset user operation events and dynamic screening condition attributes, and collects user data through point-of-view events, which is conducive to obtaining different user data according to different group categories; through data analysis of the distributed publish-subscribe message system kafka and the distributed computing engine Spark, the group category user corresponding to each user is obtained, so as to realize accurate push to users and improve the efficiency of information push.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请中的方案，下面将对本申请实施例描述中所需要使用的附图作一个简单介绍，显而易见地，下面描述中的附图是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the scheme in the present application, a brief introduction is given below to the drawings required for use in the description of the embodiments of the present application. Obviously, the drawings described below are some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1是本申请实施例提供的基于kafka的信息推送方法的应用环境示意图；FIG1 is a schematic diagram of an application environment of a Kafka-based information push method provided in an embodiment of the present application;

图2根据本申请实施例提供的基于kafka的信息推送方法的一实现流程图；FIG2 is a flowchart of an implementation method of a Kafka-based information push method provided according to an embodiment of the present application;

图3是本申请实施例提供的基于kafka的信息推送中步骤S3的一实现流程图；FIG3 is a flowchart of an implementation of step S3 in Kafka-based information push provided in an embodiment of the present application;

图4是本申请实施例提供的基于kafka的信息推送方法中步骤S4的一实现流程图；FIG4 is a flowchart of an implementation of step S4 in the Kafka-based information push method provided in an embodiment of the present application;

图5是本申请实施例提供的基于kafka的信息推送方法中步骤S42的一实现流程图；FIG5 is a flowchart of an implementation of step S42 in the Kafka-based information push method provided in an embodiment of the present application;

图6是本申请实施例提供的数据库的后台管理装置示意图；6 is a schematic diagram of a backend management device for a database provided in an embodiment of the present application;

图7是本申请实施例提供的计算机设备的示意图。FIG. 7 is a schematic diagram of a computer device provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

除非另有定义，本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同；本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在于限制本申请；本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形，意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象，而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by technicians in the technical field of the present application; the terms used in the specification of the application herein are only for the purpose of describing specific embodiments and are not intended to limit the present application; the terms "including" and "having" and any variations thereof in the specification and claims of the present application and the above-mentioned drawings are intended to cover non-exclusive inclusions. The terms "first", "second", etc. in the specification and claims of the present application or the above-mentioned drawings are used to distinguish different objects, not to describe a specific order.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that a particular feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various locations in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment that is mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

为了使本技术领域的人员更好地理解本申请方案，下面将结合附图，对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings.

下面结合附图和实施方式对本发明进行详细说明。The present invention is described in detail below with reference to the accompanying drawings and embodiments.

请参阅图1，系统架构100可以包括终端设备101、102、103，网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型，例如有线、无线通信链路或者光纤电缆等等。1 , the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or optical fiber cables, etc.

用户可以使用终端设备101、102、103通过网络104与服务器105交互，以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用，例如网页浏览器应用、搜索类应用、即时通信工具等。Users can use terminal devices 101, 102, 103 to interact with server 105 through network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, 103, such as web browser applications, search applications, instant messaging tools, etc.

终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备，包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 101 , 102 , and 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, and desktop computers.

服务器105可以是提供各种服务的服务器，例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, such as a background server that provides support for web pages displayed on the terminal devices 101 , 102 , and 103 .

需要说明的是，本申请实施例所提供的一种基于kafka的信息推送方法一般由服务器执行，相应地，基于kafka的信息推送装置一般设置于服务器中。It should be noted that the Kafka-based information push method provided in the embodiment of the present application is generally executed by a server, and accordingly, the Kafka-based information push device is generally set in the server.

应该理解，图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。It should be understood that the number of terminal devices, networks and servers in Figure 1 is only illustrative. Any number of terminal devices, networks and servers may be provided according to implementation requirements.

请参阅图2，图2示出了基于kafka的信息推送方法的一种具体实施方式。Please refer to FIG. 2 , which shows a specific implementation of the information push method based on Kafka.

需注意的是，若有实质上相同的结果，本发明的方法并不以图2所示的流程顺序为限，该方法包括如下步骤：It should be noted that if substantially the same results are achieved, the method of the present invention is not limited to the process sequence shown in FIG. 2 , and the method includes the following steps:

S1：依据预设的用户操作事件和动态筛选条件属性，生成群体类别。S1: Generate group categories based on preset user operation events and dynamic screening condition attributes.

具体的，用户在客户端进行页面浏览，与页面交互时触发用户操作事件，客户端收集该用户操作事件，并反馈给服务端，服务端通过预设的用户操作事件和动态筛选条件属性，对接收到的操作事件进行区分，从而建立群体类别。用户Specifically, when a user browses a page on the client and interacts with the page, a user operation event is triggered. The client collects the user operation event and feeds it back to the server. The server distinguishes the received operation events through the preset user operation events and dynamic screening condition attributes, thereby establishing group categories.

其中，配置动态筛选条件属性，以及筛选事件发生的次数(>、＝、<、！＝)，以及筛选事件下的属性(>、＝、<、！＝、包含、不包含)；各类事件之间可以是交集、并集的关系；该事件发生的时间，支持固定的时间段(如2019/07/02-2019/07/25)和动态时间(近n天,取当前时间为结束时间，n天前为开始时间)。例如选择事件“未付款”，条件是“>”，次数是“5”，则表示“未付款次数大于5次”。例外对于事件下的属性，属性的类型有两种，一种是数值型，相应的筛选条件有大于、等于、小于、不等于，及其发生的次数；另一种是字符型的属性，则相应的条件有包含、不包含，再填写想筛选的字符；例如选中属性名“产品名”，条件是“包含”，字符为“保险”，结合事件，组合起来是“产品未付款次数大于5次中包含保险”。Among them, configure dynamic filtering condition attributes, as well as the number of occurrences of the filtering event (>, =, <, ! =), and the attributes under the filtering event (>, =, <, ! =, include, not include); the relationship between various events can be intersection and union; the time of the event supports fixed time periods (such as 2019/07/02-2019/07/25) and dynamic time (nearly n days, take the current time as the end time, and n days ago as the start time). For example, if you select the event "unpaid", the condition is ">", and the number is "5", it means "the number of unpaid times is greater than 5 times". Exception For the attributes under the event, there are two types of attributes, one is numeric type, and the corresponding filtering conditions are greater than, equal to, less than, not equal to, and the number of occurrences; the other is character type attributes, and the corresponding conditions are include, not include, and then fill in the characters you want to filter; for example, select the attribute name "product name", the condition is "include", and the character is "insurance". Combined with the event, the combination is "the number of unpaid times of the product is greater than 5 times, including insurance".

其中，群体类别是依据预设的用户操作事件和动态筛选条件属性，用来区分不同用户群体。Among them, the group category is used to distinguish different user groups based on preset user operation events and dynamic screening condition attributes.

S2:构建埋点事件，并通过埋点事件收集用户的用户数据，将用户数据存储于分布式搜索引擎ElasticSearch中。S2: Build tracking events, collect user data through tracking events, and store the user data in the distributed search engine ElasticSearch.

具体的，在每个操作事件都设置指南针埋点事件，用户在客户端进行页面浏览，与页面交互时触发用户操作事件时，得到该用户操作事件对应的埋点，也即，通过埋点事件，客户端收集该用户数据，并反馈给服务端。在用户对操作事件进行点击或浏览等操作时，通过埋点的方式实现收集用户数据，并将用户数据存储于分布式搜索引擎ElasticSearch中。Specifically, a compass tracking event is set for each operation event. When a user browses a page on the client and triggers a user operation event when interacting with the page, the tracking point corresponding to the user operation event is obtained. That is, through the tracking point event, the client collects the user data and feeds it back to the server. When the user clicks or browses the operation event, the user data is collected through the tracking point method and stored in the distributed search engine ElasticSearch.

其中，分布式搜索引擎ElasticSearch是一个基于Lucene的搜索服务器，其是一个分布式、高扩展、高实时的搜索与数据分析引擎。它能很方便的使大量数据具有搜索、分析和探索的能力。充分利用分布式搜索引擎ElasticSearch的水平伸缩性，能使数据在生产环境变得更有价值。在本发明中，由于通过埋点事件收集来的用户数据是分布式的用户数据，而分布式搜索引擎ElasticSearch具有分布式和高扩展的特性，所以通过分布式搜索引擎ElasticSearch进行存储。Among them, the distributed search engine ElasticSearch is a search server based on Lucene, which is a distributed, highly scalable, and highly real-time search and data analysis engine. It can easily enable large amounts of data to have the ability to search, analyze, and explore. Taking full advantage of the horizontal scalability of the distributed search engine ElasticSearch can make data more valuable in a production environment. In the present invention, since the user data collected through the buried point events is distributed user data, and the distributed search engine ElasticSearch has the characteristics of distribution and high scalability, it is stored through the distributed search engine ElasticSearch.

其中，用户数据是通过构建埋点数据，在用户进行点击或浏览时，服务器收集的用户的行为数据。Among them, user data is the user behavior data collected by the server when the user clicks or browses by building embedded data.

S3：配置分布式发布订阅消息系统kafka中主题topic的参数信息，并基于配置好的主题topic，对存储于分布式搜索引擎ElasticSearch中的用户数据进行数据分析，将得到的分析结果存储到消息中间件中。S3: Configure the parameter information of the topic in the distributed publish-subscribe messaging system Kafka, and based on the configured topic, perform data analysis on the user data stored in the distributed search engine ElasticSearch, and store the analysis results in the messaging middleware.

具体的，由于经过上述步骤S2的埋点事件来收集的用户数据是储存于分布式搜索引擎ElasticSearch中，通过配置分布式发布订阅消息系统kafka中主题topic的参数信息，储存一类信息后，对用户数据进行数据分析，包括对用户数据进行解析、封装等操作，并存于消息中间件中，便于后续对用户数据进一步分析。Specifically, since the user data collected through the tracking events in the above step S2 is stored in the distributed search engine ElasticSearch, by configuring the parameter information of the topic in the distributed publish-subscribe message system Kafka, after storing a type of information, data analysis is performed on the user data, including parsing and packaging of the user data, and stored in the message middleware, it is convenient for further analysis of the user data later.

其中，分布式发布订阅消息系统kafka是由Apache软件基金会开发的一个开源流处理平台，由Scala和Java编写。分布式发布订阅消息系统kafka是一种高吞吐量的分布式发布订阅消息系统，它可以处理消费者在网站中的所有动作流数据。这些数据通常是由于吞吐量的要求而通过处理日志和日志聚合来解决。分布式发布订阅消息系统kafka的目的是通过Hadoop的并行加载机制来统一线上和离线的消息处理，也是为了通过集群来提供实时的消息。在本发明中，通过服务器通过配置主题topic的参数信息，将收集来的用户数据以封装的形式，存入分布式发布订阅消息系统kafka的消息中间件中，为后续对用户数据做进一步的分析提供基础。Among them, the distributed publish-subscribe messaging system kafka is an open source stream processing platform developed by the Apache Software Foundation and is written in Scala and Java. The distributed publish-subscribe messaging system kafka is a high-throughput distributed publish-subscribe messaging system that can process all action stream data of consumers on the website. These data are usually solved by processing logs and log aggregation due to throughput requirements. The purpose of the distributed publish-subscribe messaging system kafka is to unify online and offline message processing through Hadoop's parallel loading mechanism, and to provide real-time messages through clusters. In the present invention, the collected user data is stored in the message middleware of the distributed publish-subscribe messaging system kafka in an encapsulated form by configuring the parameter information of the topic through the server, providing a basis for further analysis of the user data in the future.

其中，主题topic是分布式发布订阅消息系统kafka的一个组件，是一个存储消息的逻辑概念，也即,一个消息集合。每条消息发送到分布式发布订阅消息系统kafka集群的消息都有一个主题topic。对应不同主题topic的消息是分开存储的，每个主题topic可以有多个生产者向它发送消息，也可以有多个消费者去消费其中的消息。在本发明中，通过配置主题topic的参数信息,储存某一类的消息，进而经过分布式发布订阅消息系统kafka将用户数据存入消息中间件中。Among them, topic is a component of the distributed publish-subscribe messaging system kafka, which is a logical concept for storing messages, that is, a message collection. Each message sent to the distributed publish-subscribe messaging system kafka cluster has a topic. Messages corresponding to different topics are stored separately. Each topic can have multiple producers sending messages to it, and multiple consumers consuming the messages. In the present invention, by configuring the parameter information of the topic, a certain type of message is stored, and then the user data is stored in the message middleware through the distributed publish-subscribe messaging system kafka.

其中，消息中间件适用于需要可靠的数据传送的分布式环境。采用消息中间件机制的系统中，不同的对象之间通过传递消息来激活对方的事件，完成相应的操作。消息中间件能在不同平台之间通信，它常被用来屏蔽掉各种平台及协议之间的特性，实现应用程序之间的协同，其优点在于能够在客户和服务器之间提供同步和异步的连接，并且在任何时刻都可以将消息进行传送或者存储转发，这也是它比远程过程调用更进一步的原因。在本发明中，通过消息中间件存储用户数据进行数据分析后的分析结果。Among them, message middleware is suitable for distributed environments that require reliable data transmission. In a system using a message middleware mechanism, different objects activate each other's events by passing messages to complete corresponding operations. Message middleware can communicate between different platforms. It is often used to shield the characteristics between various platforms and protocols and realize the collaboration between applications. Its advantage is that it can provide synchronous and asynchronous connections between clients and servers, and messages can be transmitted or stored and forwarded at any time, which is why it is a step further than remote procedure calls. In the present invention, the analysis results after data analysis are stored by the message middleware.

其中，分析结果是通过对用户数据进行初步的解析、封装后，得到的用户数据，该分析结果能够写入消息中间件中。The analysis result is user data obtained after preliminary parsing and encapsulation of the user data, and the analysis result can be written into the message middleware.

S4：采用分布式计算引擎Spark，从消息中间件中获取分析结果，并对分析结果进行过滤分析，将得到的过滤分析结果回写到消息中间件中。S4: Use the distributed computing engine Spark to obtain the analysis results from the message middleware, filter and analyze the analysis results, and write the filtered analysis results back to the message middleware.

具体的，通过步骤S3已经用户数据进行数据分析，并将分析结果写入消息中间件，分布式计算引擎Spark通过生成SQL命名，对存储在消息中间件中的分析结果进行过滤分析，并查询在hive生成的表，返回群体类别的用户数量，通过匹配相应的用户信息，得到每个用户相应的用户信息，将过滤分析结果回写消息中间件中。将过滤分析结果回写消息中间件的目的在于分布式发布订阅消息系统kafka能够通过回写中间件，从而得到过滤分析结果。Specifically, in step S3, the user data has been analyzed, and the analysis results are written into the message middleware. The distributed computing engine Spark generates SQL names, filters and analyzes the analysis results stored in the message middleware, and queries the table generated in hive to return the number of users in the group category. By matching the corresponding user information, the corresponding user information of each user is obtained, and the filtering and analysis results are written back to the message middleware. The purpose of writing the filtering and analysis results back to the message middleware is that the distributed publish-subscribe message system kafka can obtain the filtering and analysis results by writing back to the middleware.

其中，分布式计算引擎Spark(Apache Spark)是专为大规模数据处理而设计的快速通用的计算引擎。所具有的优点；Job中间输出结果可以保存在内存中，从而不再需要读写HDFS，因此分布式计算引擎Spark能更好地适用于数据挖掘与机器学习等需要迭代的MapReduce的算法。在本发明中，通过分布式计算引擎Spark对分析结果进行过滤分析，过滤掉冗余、异常等数据，识别出用户数量和用户信息，并将过滤分析结果回写于消息中间件。Among them, the distributed computing engine Spark (Apache Spark) is a fast and general computing engine designed for large-scale data processing. It has the advantages that the intermediate output results of the job can be stored in the memory, so there is no need to read and write HDFS, so the distributed computing engine Spark is better suitable for MapReduce algorithms that require iterations such as data mining and machine learning. In the present invention, the distributed computing engine Spark is used to filter and analyze the analysis results, filter out redundant and abnormal data, identify the number of users and user information, and write the filtered analysis results back to the message middleware.

S5:将消息中间件中的过滤分析结果,写入到分布式发布订阅消息系统kafka中，通过分布式发布订阅消息系统kafka对过滤分析结果进行解析处理，得到每个用户对应的群体类别。S5: Write the filtering analysis results in the message middleware into the distributed publish-subscribe message system Kafka, and parse and process the filtering analysis results through the distributed publish-subscribe message system Kafka to obtain the group category corresponding to each user.

具体的，通过将消息中间件中的过滤分析结果写入到分布式发布订阅消息系统kafka中，分布式发布订阅消息系统kafka得以消费消息中间件的数据，再根据配置分布式发布订阅消息系统kafka中的主题topic的参数信息，对过滤分析结果进行解析处理，得到每个用户对应的群体类别。Specifically, by writing the filtering analysis results in the message middleware into the distributed publish-subscribe messaging system Kafka, the distributed publish-subscribe messaging system Kafka can consume the data of the message middleware, and then parse the filtering analysis results according to the parameter information of the topic in the distributed publish-subscribe messaging system Kafka to obtain the group category corresponding to each user.

S6：根据业务信息，从群体类别中，确定待推荐的群体类别，并向待推荐的群体类别包含的用户推送业务信息。S6: Determine the group category to be recommended from the group categories according to the business information, and push the business information to users included in the group category to be recommended.

具体的，将不同的业务信息对应不同的群体类别，由于群体类别所包含的用户已经确定，将对应的业务信息推送给对应的群体类别包含的用户。Specifically, different business information is matched to different group categories. Since the users included in the group categories have been determined, the corresponding business information is pushed to the users included in the corresponding group categories.

例如，业务信息是关于某化妆物品的介绍和优惠政策，所以要找到对应对化妆物品感兴趣的群体类别，该群体类别已经包含了对其感兴趣的用户。故将该业务信息推送给对化妆物品感兴趣的群体类别的用户，以此提高推送效率。For example, the business information is about the introduction and preferential policies of a certain cosmetic product, so we need to find the group category that is interested in the cosmetic product, and the group category already contains users who are interested in it. Therefore, the business information is pushed to users in the group category who are interested in cosmetic products, so as to improve the push efficiency.

进一步的，推送的手段包括批量同步支持单次同步和每日增量同步等方式。其中，每日增量方式可以选择增量数据的周期(1、30、60、90)天；选择每日增量同步时，任务每天定时去同步计算群体类别包含的用户数量，以保证数据的一致性。投放后，数据又写入消息中间件中；程序从分布式发布订阅消息系统kafka消费数据后，根据投放方式将计算后的数据写入Hbase(请求访问)或者Tidb(批量同步)，又把各渠道对应的用户数写入kafka中，从而得到各渠道下的用户数。Furthermore, the means of push include batch synchronization, single synchronization and daily incremental synchronization. Among them, the daily incremental method can select the incremental data cycle (1, 30, 60, 90) days; when daily incremental synchronization is selected, the task will synchronize and calculate the number of users included in the group category at regular intervals every day to ensure data consistency. After delivery, the data is written to the message middleware; after the program consumes data from the distributed publish-subscribe message system kafka, it writes the calculated data to Hbase (request access) or Tidb (batch synchronization) according to the delivery method, and writes the number of users corresponding to each channel into kafka, thereby obtaining the number of users under each channel.

本实施例中，通过依据预设的用户操作事件和动态筛选条件属性，生成群体类别，并构建埋点事件，并通过埋点事件收集用户的用户数据，有利于根据不同群体类别，获取不同的用户数据；通过分布式发布订阅消息系统kafka和分布式计算引擎Spark的数据分析，获取每个用户对应的群体类别，实现对用户的精准推送，提高信息推送的效率。In this embodiment, group categories are generated based on preset user operation events and dynamic screening condition attributes, and point-of-view events are constructed. User data of users are collected through point-of-view events, which is conducive to obtaining different user data according to different group categories. Through data analysis of the distributed publish-subscribe message system kafka and the distributed computing engine Spark, the group category corresponding to each user is obtained, accurate push to users is achieved, and the efficiency of information push is improved.

请参阅图3，图3示出了步骤S3的一种具体实施方式，步骤S3中，配置分布式发布订阅消息系统kafka中主题topic的参数信息，并基于配置好的主题topic，对存储于分布式搜索引擎ElasticSearch中的用户数据进行数据分析，将得到的分析结果存储到消息中间件中的具体实现过程，详叙如下：Please refer to FIG. 3 , which shows a specific implementation of step S3. In step S3 , parameter information of the topic in the distributed publish-subscribe messaging system kafka is configured, and based on the configured topic, data analysis is performed on user data stored in the distributed search engine ElasticSearch, and the specific implementation process of storing the obtained analysis results in the message middleware is described in detail as follows:

S31：配置分布式发布订阅消息系统kafka中主题topic的参数信息，并基于配置好的主题topic对用户数据进行数据分析，得到分析结果。S31: Configure the parameter information of the topic in the distributed publish-subscribe messaging system Kafka, and perform data analysis on the user data based on the configured topic to obtain the analysis results.

具体的，由于在使用分布式发布订阅消息系统kafka发送消息和消费消息之前，必须先要配置主题topic的参数信息，则通过分布式发布订阅消息系统kafka配置主题topic的参数信息，主要通过配置主题topic的类型、名称和参数等等。设置完主题topic后，对用户数据进行数据分析，得到分析结果。Specifically, since the parameter information of the topic must be configured before sending and consuming messages using the distributed publish-subscribe messaging system Kafka, the parameter information of the topic is configured through the distributed publish-subscribe messaging system Kafka, mainly by configuring the type, name, and parameters of the topic. After setting the topic, perform data analysis on the user data to obtain the analysis results.

S32：将分析结果封装成json字符串，得到分析结果json字符串，并将分析结果json字符串写入消息中间件。S32: Encapsulate the analysis result into a json string to obtain the analysis result json string, and write the analysis result json string into the message middleware.

具体的，将分析结果以json字符串的形式写入消息中间件，便于后续的分布式计算引擎Spark对分析结果的获取。Specifically, the analysis results are written into the message middleware in the form of a json string, so that the subsequent distributed computing engine Spark can obtain the analysis results.

本实施例中，通过分布式发布订阅消息系统kafka中主题topic的参数信息，对用户数据进行数据分析，并将得到的分析结果封装成json字符串，写入信息消息中间件中，有利于对用户数据的分析，便于后续的分布式计算引擎Spark对分析结果的获取，有利于提高对用户群体的分析，进而提高推送效率。In this embodiment, the user data is analyzed through the parameter information of the topic in the distributed publish-subscribe message system Kafka, and the obtained analysis results are encapsulated into a JSON string and written into the information message middleware, which is conducive to the analysis of user data and facilitates the subsequent distributed computing engine Spark to obtain the analysis results, which is conducive to improving the analysis of user groups and thus improving the push efficiency.

请参阅图4，图4示出了步骤S4的一种具体实施方式，步骤S4中，采用分布式计算引擎Spark，从消息中间件中获取分析结果，并对分析结果进行过滤分析，将得到的过滤分析结果回写到消息中间件中的具体实现过程，详叙如下：Please refer to FIG. 4 , which shows a specific implementation of step S4. In step S4, the distributed computing engine Spark is used to obtain the analysis results from the message middleware, and the analysis results are filtered and analyzed. The specific implementation process of writing the obtained filtered analysis results back to the message middleware is described in detail as follows:

S41：将分析结果封装成SQL命令。S41: Encapsulate the analysis results into SQL commands.

具体的，上述步骤S32已经将分析结果转化成json字符串的格式，通过从消息中间中获取到分析结果，并转化成结构化查询语言的SQL命令，其封装的SQL命令便于查询hive中生成的用户数据分析表，通过对应的关系，获取用户数量和用户信息。Specifically, the above step S32 has converted the analysis results into the format of a json string. By obtaining the analysis results from the message and converting them into SQL commands of the structured query language, the encapsulated SQL commands are convenient for querying the user data analysis table generated in hive, and obtaining the number of users and user information through corresponding relationships.

S42：通过分布式计算引擎Spark执行SQL命令，遍寻hive表制成的表，得到群体类别的用户数量和用户信息，并将用户数量和用户信息保存于消息中间件中。S42: Execute SQL commands through the distributed computing engine Spark, search the table made of hive table, obtain the number of users and user information of the group category, and save the number of users and user information in the message middleware.

具体的，通过分布式计算引擎Spark将从消息中间件中消费到的分析结果，分析结果以json字符串的形式存在，将其封装成SQL命令，并执行SQL命令，遍寻hive制成的表，过滤分析出用户数量和用户信息。Specifically, the analysis results consumed from the message middleware are packaged into SQL commands in the form of JSON strings through the distributed computing engine Spark, and the SQL commands are executed to search the tables made by hive and filter and analyze the number of users and user information.

其中，hive是基于Hadoop的一个数据仓库工具，用来进行数据提取、转化、加载，这是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。hive数据仓库工具能将结构化的数据文件映射为一张数据库表，并提供SQL查询功能，能将SQL语句转变成MapReduce任务来执行。在本发明中，由于hive提供SQL查询功能，通过分布式计算引擎Spark将分析结果封装成SQL命令查询hive制成的表。hive能够实时记录并存储着用户在群体类别进行的操作事件，并生成表格的形式进行保存。通过将分析结果封装成SQL命令的形式，将分析结果与hive生成的表的数据进行对比，能够得出群体类别的用户数量和用户信息。Among them, hive is a data warehouse tool based on Hadoop, which is used for data extraction, conversion and loading. It is a mechanism that can store, query and analyze large-scale data stored in Hadoop. The hive data warehouse tool can map structured data files into a database table, and provide SQL query function, which can convert SQL statements into MapReduce tasks for execution. In the present invention, since hive provides SQL query function, the analysis results are encapsulated into SQL commands to query the table made by hive through the distributed computing engine Spark. Hive can record and store the operation events performed by users in group categories in real time, and generate tables for storage. By encapsulating the analysis results in the form of SQL commands and comparing the analysis results with the data in the table generated by hive, the number of users and user information of group categories can be obtained.

其中，用户数量是在分群信息下的用户数量，在用户在该群体类别设置的筛选条件下，进行点击或者浏览操作事件的人数。The number of users refers to the number of users under the group information, and is the number of users who have clicked or browsed under the filtering conditions set for the group category.

其中，用户信息是在群体类别下的对应每个用户的个人信息，包括每个用户的电话、邮件等等信息。Among them, user information is the personal information corresponding to each user under the group category, including each user's phone number, email and other information.

本实施例中，由于hive能够实时记录并存储着用户在群体类别进行的操作事件，并生成表格的形式进行保存，通过将分析结果封装成SQL命令，并通过分布式计算引擎Spark执行SQL命令，遍寻hive表制成的表，实现获取分群信息中的用户数量和用户信息，为后续为业务信息推送确定推送人数和推送目标信息，进而提高推送效率。In this embodiment, since hive can record and store the operation events performed by users in group categories in real time and save them in the form of tables, the analysis results are encapsulated into SQL commands, and the SQL commands are executed through the distributed computing engine Spark, the tables made of hive tables are searched to obtain the number of users and user information in the group information, so as to determine the number of pushes and push target information for subsequent business information push, thereby improving the push efficiency.

请参阅图5，图5示出了步骤S42的一种具体实施方式，步骤S42中通过分布式计算引擎Spark执行SQL命令，遍寻hive表制成的表，得到群体类别的用户数量和用户信息，并将用户数量和用户信息保存于消息中间件中的具体实现过程，详叙如下：Please refer to FIG. 5 , which shows a specific implementation of step S42. In step S42, the SQL command is executed by the distributed computing engine Spark, and the table made of the hive table is searched to obtain the number of users and user information of the group category, and the number of users and user information are saved in the message middleware. The specific implementation process is described in detail as follows:

S421：通过SQL命令遍历hive库制成的表中的哟保护数据，过滤删除重复的用户数据，得到过滤结果。S421: traverse the protected data in the table created in the hive library through SQL commands, filter and delete duplicate user data, and obtain the filtering result.

具体的，由于用户数据中可能存在重复的信息，这一部分的重复的信息相当于是冗余的，可以将其过滤删除掉，得到过滤结果。Specifically, since there may be repeated information in the user data, this part of the repeated information is equivalent to redundancy, and it can be filtered and deleted to obtain the filtering result.

例如，在用户在不同时间段，重复点击同一操作事件，记录在群体类别里面就属于重复的信息，为了提高数据处理效率，可以将其过滤删除。For example, if a user repeatedly clicks on the same operation event in different time periods, the information recorded in the group category is considered duplicate information. In order to improve data processing efficiency, it can be filtered and deleted.

S422：统计过滤结果的分群信息中的用户数量，并匹配用户对应信息，得到用户数量和用户信息，并将用户数量和用户信息保存与主题中间件中。S422: Count the number of users in the grouping information of the filtering result, match the corresponding information of the users, obtain the number of users and the user information, and save the number of users and the user information in the theme middleware.

具体的，由于SQL命令是基于用户数据而生成的，通过遍寻hive制成的表，能够删除重复分群信息的用户信息，经过统计后，得到用户数据，再经过匹配用户对应的信息，得到用户数量和用户信息。Specifically, since the SQL command is generated based on user data, by searching the table made by hive, the user information of the duplicate grouping information can be deleted. After statistics, the user data is obtained, and then the number of users and user information are obtained by matching the corresponding information of the user.

本实施例中，通过过滤删除重复的分群信息，并统计过滤结果后的分群信息中的用户数量，并匹配用户对应信息，得到用户数量和用户信息，有效提高对群体类别的数据处理，实现对用户数量和用户信息获取效率，提高推送精准度，进而提高推送效率。In this embodiment, duplicate grouping information is filtered and deleted, the number of users in the grouping information after filtering is counted, and the corresponding information of the users is matched to obtain the number of users and user information, thereby effectively improving the data processing of group categories, achieving the efficiency of obtaining the number of users and user information, improving the push accuracy, and thus improving the push efficiency.

步骤S4的另一种具体实施方式详述如下：Another specific implementation of step S4 is described in detail as follows:

将过滤分析结果封装成json字符串，得到过滤分析结果json字符串。Encapsulate the filtering analysis result into a json string to obtain the filtering analysis result json string.

具体的，将过滤分析结果封装成json字符串，便于对过滤分析结果的存储，和后续对其进行分析。Specifically, the filtering analysis results are encapsulated into a json string to facilitate the storage of the filtering analysis results and subsequent analysis thereof.

将过滤分析结果json字符串回写到消息中间件。Write the filtering and analysis result json string back to the message middleware.

具体的，将过滤分析结果重新封装成json字符串，并将其回写到消息中间件，为后续步骤kafka消费中间件的数据提供基础。Specifically, the filtering and analysis results are repackaged into a JSON string and written back to the message middleware, providing a basis for the data of the Kafka consumption middleware in the subsequent step.

本实施例中，通过将过滤分析结果封装成json字符串，并将过滤分析结果json字符串回写到消息中间件，提高用户数据的处理效率，为后续步骤kafka消费中间件的数据提供基础，进而确定群体类别的用户，提高推送的效率。In this embodiment, by encapsulating the filtering and analysis results into a JSON string and writing the filtering and analysis result JSON string back to the message middleware, the processing efficiency of user data is improved, providing a basis for the data of the subsequent kafka consumption middleware, and then determining the group category of users, thereby improving the push efficiency.

在步骤S6之后，该基于kafka的信息推送方法还包括：After step S6, the kafka-based information push method further includes:

通过调度器Quartz，定时的遍历群体类别，更新群体类别对应的用户。Through the scheduler Quartz, group categories are traversed regularly and the users corresponding to the group categories are updated.

具体的，通过设置定时任务，使用调度器Quartz，按照预设时间的去遍历群体类别，如果群体类别投放时设置了每日更新，则服务器把群体类别及投放渠道，写入分布式发布订阅消息系统kafka组件中，再进行一次用户数量和用户信息的获取，从而更新群体类别对应的用户。Specifically, by setting a scheduled task and using the scheduler Quartz, group categories are traversed at preset times. If daily updates are set for group categories, the server writes the group categories and delivery channels into the distributed publish-subscribe messaging system kafka component, and then obtains the number of users and user information, thereby updating the users corresponding to the group categories.

其中，调度器Quartz是一个完全由java编写的开源作业调度框架。在本发明中，通过调度器Quartz定时更新群体类别对应的用户。The scheduler Quartz is an open source job scheduling framework written entirely in Java. In the present invention, the scheduler Quartz regularly updates the users corresponding to the group categories.

本实施例中，通过调度器Quartz，定时的遍历群体类别，更新群体类别的群体类别对应的用户，有利于对群体类别数据的更新，有利于业务信息的推送，提高业务推送的效率。In this embodiment, the group categories are traversed regularly through the scheduler Quartz, and the users corresponding to the group categories are updated, which is beneficial to the update of group category data, the push of business information, and the improvement of the efficiency of business push.

根据群体类别的生成属性，为群体类别包含的用户赋予相应的标签。According to the generated attributes of the group category, the users included in the group category are assigned corresponding labels.

具体的，根据建立群体类别时所设置的筛选条件属性，为群体类别包含的用户赋予相应的标签。Specifically, according to the filtering condition attributes set when the group category is established, corresponding labels are assigned to the users included in the group category.

在一具体实施例中：配置的事件名称是“商品加入购物车未付款”，筛选条件是“发生了5次”，通过数据处理端，生成了用户包，包含了该事件的用户；此时加工生成标签，给改用户包的用户，打上“有购买意向”的标签，接着可以利用各种渠道，给用户推荐该商品，提供信息推送的效率。In a specific embodiment: the configured event name is "product added to shopping cart but not paid for", and the screening condition is "occurred 5 times". Through the data processing end, a user package is generated, which includes the users of this event. At this time, tags are processed and generated, and the users of this user package are labeled as "having purchase intention". Then, various channels can be used to recommend the product to the user, providing efficiency in information push.

本实施中，通过根据建立群体类别时所设置的筛选条件属性，为群体类别包含的用户赋予相应的标签，实现将不同的群体类别包含的用户赋予标识，便于后期识别分类。In this implementation, by assigning corresponding labels to users included in the group categories according to the filtering condition attributes set when the group categories are established, the users included in different group categories are identified, which is convenient for later identification and classification.

如图6所示，本实施例的基于kafka的信息推送装置包括：群体类别生成模块51、用户数据收集模块52、用户数据分析模块53、过滤分析结果模块54、群体类别确定模块55及业务信息推送模块56，其中：As shown in FIG6 , the Kafka-based information push device of this embodiment includes: a group category generation module 51, a user data collection module 52, a user data analysis module 53, a filtering and analysis result module 54, a group category determination module 55 and a business information push module 56, wherein:

群体类别生成模块51，用于依据预设的用户操作事件和动态筛选条件属性，生成群体类别；A group category generating module 51 is used to generate group categories according to preset user operation events and dynamic screening condition attributes;

用户数据收集模块52，用于构建埋点事件，并通过埋点事件收集用户的用户数据，将用户数据存储于分布式搜索引擎ElasticSearch中；The user data collection module 52 is used to construct a buried event, collect user data of the user through the buried event, and store the user data in the distributed search engine ElasticSearch;

用户数据分析模块53，用于配置分布式发布订阅消息系统kafka中主题topic的参数信息，并基于配置好的主题topic，对存储于分布式搜索引擎ElasticSearch中的用户数据进行数据分析，将得到的分析结果存储到消息中间件中；The user data analysis module 53 is used to configure the parameter information of the topic in the distributed publish-subscribe messaging system kafka, and based on the configured topic, perform data analysis on the user data stored in the distributed search engine ElasticSearch, and store the obtained analysis results in the messaging middleware;

过滤分析结果模块54，用于采用分布式计算引擎Spark，从消息中间件中获取分析结果，并对分析结果进行过滤分析，将得到的过滤分析结果回写到消息中间件中；The filtering analysis result module 54 is used to obtain the analysis results from the message middleware by using the distributed computing engine Spark, and to filter and analyze the analysis results, and write the obtained filtering analysis results back to the message middleware;

群体类别确定模块55，用于将消息中间件中的过滤分析结果,写入到分布式发布订阅消息系统kafka中，通过分布式发布订阅消息系统kafka对过滤分析结果进行解析处理，得到每个用户对应的群体类别；The group category determination module 55 is used to write the filtering analysis results in the message middleware into the distributed publish-subscribe message system Kafka, and parse the filtering analysis results through the distributed publish-subscribe message system Kafka to obtain the group category corresponding to each user;

业务信息推送模块56，用于根据业务信息，从群体类别中，确定待推荐的群体类别，并向待推荐的群体类别包含的用户推送业务信息。The business information push module 56 is used to determine the group category to be recommended from the group categories according to the business information, and push the business information to the users included in the group category to be recommended.

进一步的，用户数据分析模块53包括：Furthermore, the user data analysis module 53 includes:

主题设置单元，用于配置分布式发布订阅消息系统kafka中主题topic的参数信息，并基于配置好的主题topic对用户数据进行数据分析，得到分析结果；The topic setting unit is used to configure the parameter information of the topic in the distributed publish-subscribe messaging system Kafka, and analyze the user data based on the configured topic to obtain the analysis results;

分析结果封装单元，用于将分析结果封装成json字符串，得到分析结果json字符串，并将分析结果json字符串写入消息中间件。The analysis result encapsulation unit is used to encapsulate the analysis result into a json string, obtain the analysis result json string, and write the analysis result json string into the message middleware.

进一步的，过滤分析结果模块54包括：Furthermore, the filtering analysis result module 54 includes:

SQL命令生成单元，用于将分析结果封装成SQL命令；SQL command generation unit, used to encapsulate the analysis results into SQL commands;

用户数据获取单元，用于通过分布式计算引擎Spark执行SQL命令，遍寻hive表制成的表，得到分群信息的用户数量和用户信息，并将用户数量和用户信息保存于消息中间件中。The user data acquisition unit is used to execute SQL commands through the distributed computing engine Spark, search the table made of the hive table, obtain the number of users and user information of the grouping information, and save the number of users and user information in the message middleware.

进一步的，用户数据获取单元包括：Furthermore, the user data acquisition unit includes:

过滤结果获取子单元，用于通过SQL命令遍历hive库制成的表中的用户数据，过滤删除重复的用户数据，得到过滤结果；The filtering result acquisition subunit is used to traverse the user data in the table created by the hive library through SQL commands, filter and delete duplicate user data, and obtain the filtering results;

用户匹配子单元，用于统计过滤结果的群体类别中的用户数量，并匹配用户对应信息，得到用户数量和用户信息，并将用户数量和用户信息保存与主题中间件中。The user matching subunit is used to count the number of users in the group category of the filtering result, match the corresponding information of the users, obtain the number of users and user information, and save the number of users and user information in the theme middleware.

过滤分析结果模块54还包括：The filtering analysis result module 54 also includes:

字符串获取单元，用于将过滤分析结果封装成json字符串，得到过滤分析结果json字符串；The string acquisition unit is used to encapsulate the filtering analysis result into a json string to obtain the filtering analysis result json string;

字符串回写单元，用于将过滤分析结果json字符串回写到消息中间件。The string writing unit is used to write the json string of the filtering and analysis results back to the message middleware.

进一步的，基于kafka的信息推送装置还包括：Furthermore, the information push device based on Kafka also includes:

群体类别更新单元模块，用于通过调度器Quartz，定时的遍历群体类别，更新群体类别对应的用户。The group category update unit module is used to traverse the group categories regularly through the scheduler Quartz and update the users corresponding to the group categories.

进一步的，该基于kafka的信息推送装置还包括：Furthermore, the kafka-based information push device also includes:

标签获取模块，用于根据群体类别的生成属性，为群体类别包含的用户赋予相应的标签。The label acquisition module is used to assign corresponding labels to users included in the group category according to the generation attributes of the group category.

为解决上述技术问题，本申请实施例还提供计算机设备。具体请参阅图7，图7为本实施例计算机设备基本结构框图。To solve the above technical problems, the present application also provides a computer device. Please refer to FIG7 for details, which is a basic structural block diagram of the computer device of the present embodiment.

计算机设备6包括通过系统总线相互通信连接存储器61、处理器62、网络接口63。需要指出的是，图中仅示出了具有三种组件存储器61、处理器62、网络接口63的计算机设备6，但是应理解的是，并不要求实施所有示出的组件，可以替代的实施更多或者更少的组件。其中，本技术领域技术人员可以理解，这里的计算机设备是一种能够按照事先设定或存储的指令，自动进行数值计算和/或信息处理的设备，其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit，ASIC)、可编程门阵列(Field－Programmable Gate Array，FPGA)、数字处理器(Digital Signal Processor，DSP)、嵌入式设备等。The computer device 6 includes a memory 61, a processor 62, and a network interface 63 that are interconnected through a system bus. It should be noted that the figure only shows a computer device 6 having three components: a memory 61, a processor 62, and a network interface 63, but it should be understood that it is not required to implement all the components shown, and more or fewer components can be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculations and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, application specific integrated circuits (ASIC), field-programmable gate arrays (FPGA), digital signal processors (DSP), embedded devices, etc.

计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。Computer devices can be computing devices such as desktop computers, notebooks, PDAs, and cloud servers. Computer devices can interact with users through keyboards, mice, remote controls, touch pads, or voice control devices.

存储器61至少包括一种类型的可读存储介质，可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如，SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中，存储器61可以是计算机设备6的内部存储单元，例如该计算机设备6的硬盘或内存。在另一些实施例中，存储器61也可以是计算机设备6的外部存储设备，例如该计算机设备6上配备的插接式硬盘，智能存储卡(SmartMedia Card,SMC)，安全数字(Secure Digital,SD)卡，闪存卡(Flash Card)等。当然，存储器61还可以既包括计算机设备6的内部存储单元也包括其外部存储设备。本实施例中，存储器61通常用于存储安装于计算机设备6的操作系统和各类应用软件，例如基于kafka的信息推送方法的程序代码等。此外，存储器61还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 61 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, disk, optical disk, etc. In some embodiments, the memory 61 can be an internal storage unit of the computer device 6, such as a hard disk or memory of the computer device 6. In other embodiments, the memory 61 can also be an external storage device of the computer device 6, such as a plug-in hard disk equipped on the computer device 6, a smart memory card (SmartMedia Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), etc. Of course, the memory 61 can also include both the internal storage unit of the computer device 6 and its external storage device. In this embodiment, the memory 61 is generally used to store the operating system and various application software installed on the computer device 6, such as the program code of the information push method based on kafka, etc. In addition, the memory 61 can also be used to temporarily store various types of data that have been output or are to be output.

处理器62在一些实施例中可以是中央处理器(Central Processing Unit，CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器62通常用于控制计算机设备6的总体操作。本实施例中，处理器62用于运行存储器61中存储的程序代码或者处理数据，例如运行一种基于kafka的信息推送方法的程序代码。In some embodiments, the processor 62 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 62 is generally used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is used to run the program code stored in the memory 61 or process data, such as running a program code of a kafka-based information push method.

网络接口63可包括无线网络接口或有线网络接口，该网络接口63通常用于在计算机设备6与其他电子设备之间建立通信连接。The network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.

本申请还提供了另一种实施方式，即提供一种计算机可读存储介质，计算机可读存储介质存储有服务器维护程序，服务器维护程序可被至少一个处理器执行，以使至少一个处理器执行如上述的一种基于kafka的信息推送方法的步骤。The present application also provides another implementation, namely, providing a computer-readable storage medium, which stores a server maintenance program. The server maintenance program can be executed by at least one processor to enable the at least one processor to perform the steps of the above-mentioned Kafka-based information push method.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本申请各个实施例的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the above-mentioned embodiment methods can be implemented by means of software plus a necessary general hardware platform, and of course by hardware, but in many cases the former is a better implementation method. Based on such an understanding, the technical solution of the present application, or the part that contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, a disk, or an optical disk), and includes a number of instructions for a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods of each embodiment of the present application.

本发明所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain)，本质上是一个去中心化的数据库，是一串使用密码学方法相关联产生的数据块，每一个数据块中包含了一批次网络交易的信息，用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this invention is a new application mode of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanism, encryption algorithm, etc. Blockchain is essentially a decentralized database, a string of data blocks generated by cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify the validity of its information (anti-counterfeiting) and generate the next block. Blockchain can include the underlying blockchain platform, platform product service layer, and application service layer.

显然，以上所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例，附图中给出了本申请的较佳实施例，但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现，相反地，提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明，对于本领域的技术人员来而言，其依然可以对前述各具体实施方式所记载的技术方案进行修改，或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构，直接或间接运用在其他相关的技术领域，均同理在本申请专利保护范围之内。Obviously, the embodiments described above are only some embodiments of the present application, rather than all embodiments. The preferred embodiments of the present application are given in the accompanying drawings, but they do not limit the patent scope of the present application. The present application can be implemented in many different forms. On the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although the present application is described in detail with reference to the aforementioned embodiments, for those skilled in the art, it is still possible to modify the technical solutions recorded in the aforementioned specific implementation methods, or to perform equivalent replacement of some of the technical features therein. Any equivalent structure made using the contents of the specification and drawings of this application, directly or indirectly used in other related technical fields, is similarly within the scope of patent protection of this application.

Claims

Translated fromChinese

1.一种基于kafka的信息推送方法，其特征在于，包括：1. A method for pushing information based on Kafka, comprising:

2.根据权利要求1所述的基于kafka的信息推送方法，其特征在于,所述配置分布式发布订阅消息系统kafka中主题topic的参数信息，并基于配置好的主题topic，对存储于所述分布式搜索引擎ElasticSearch中的所述用户数据进行数据分析，将得到的分析结果存储到消息中间件中包括：2. According to the kafka-based information push method of claim 1, it is characterized in that the configuration of the parameter information of the topic in the distributed publish-subscribe message system kafka, and based on the configured topic, performing data analysis on the user data stored in the distributed search engine ElasticSearch, and storing the obtained analysis results in the message middleware comprises:

3.根据权利要求1所述的基于kafka的信息推送方法，其特征在于，所述采用分布式计算引擎Spark，从所述消息中间件中获取所述分析结果，并对所述分析结果进行过滤分析，将得到的过滤分析结果回写到所述消息中间件中包括：3. The method for pushing information based on Kafka according to claim 1 is characterized in that the step of using the distributed computing engine Spark to obtain the analysis result from the message middleware, filtering and analyzing the analysis result, and writing the obtained filtering and analysis result back to the message middleware comprises:

4.根据权利要求3所述的基于kafka的信息推送方法，其特征在于，所述通过所述分布式计算引擎Spark执行所述SQL命令，遍寻hive表制成的表，得到述群体类别的用户数量和用户信息，并将所述用户数量和所述用户信息保存于所述消息中间件中包括：4. The information push method based on kafka according to claim 3 is characterized in that the step of executing the SQL command through the distributed computing engine Spark, searching the table made of hive tables, obtaining the number of users and user information of the group category, and saving the number of users and the user information in the message middleware comprises:

通过所述SQL命令遍历所述hive表中的所述用户数据，过滤删除重复的所述用户数据，得到过滤结果；Traversing the user data in the hive table through the SQL command, filtering and deleting duplicate user data, and obtaining a filtering result;

统计所述过滤结果的所述群体类别中的用户数量，并匹配所述用户对应信息，得到所述用户数量和所述用户信息，并将所述用户数量和所述用户信息保存于所述主题topic的中间件中。The number of users in the group category of the filtering result is counted, and the user corresponding information is matched to obtain the number of users and the user information, and the number of users and the user information are saved in the middleware of the topic.

5.根据权利要求1所述的基于kafka的信息推送方法，其特征在于，所述采用分布式计算引擎Spark，从所述消息中间件中获取所述分析结果，并对所述分析结果进行过滤分析，将得到的过滤分析结果回写到所述消息中间件中还包括：5. The method for pushing information based on Kafka according to claim 1 is characterized in that the step of using the distributed computing engine Spark to obtain the analysis result from the message middleware, filtering and analyzing the analysis result, and writing the obtained filtering and analysis result back to the message middleware further comprises:

将所述过滤分析结果封装成json字符串，得到过滤分析结果json字符串;Encapsulate the filtering analysis result into a json string to obtain a filtering analysis result json string;

6.根据权利要求1至5任一所述的基于kafka的信息推送方法，其特征在于，在所述根据业务信息，从所述群体类别中，确定待推荐的群体类别，并向所述待推荐的群体类别包含的用户推送所述业务信息之后，所述方法还包括：6. The kafka-based information push method according to any one of claims 1 to 5, characterized in that after determining the group category to be recommended from the group category according to the business information and pushing the business information to users included in the group category to be recommended, the method further comprises:

7.根据权利要求1至5任一所述的基于kafka的信息推送方法，其特征在于，在所述根据业务信息，从所述群体类别中，确定待推荐的群体类别，并向所述待推荐的群体类别包含的用户推送所述业务信息之后，所述方法还包括：7. The kafka-based information push method according to any one of claims 1 to 5, characterized in that after determining the group category to be recommended from the group category according to the business information, and pushing the business information to users included in the group category to be recommended, the method further comprises:

根据所述群体类别的生成属性，为所述群体类别包含的用户赋予相应的标签。According to the generated attributes of the group category, corresponding labels are assigned to the users included in the group category.

8.一种基于kafka的信息推送装置，其特征在于，包括：8. A Kafka-based information push device, comprising:

9.一种计算机设备，包括存储器和处理器，所述存储器中存储有计算机程序，所述处理器执行所述计算机程序时实现如权利要求1至7中任一项所述的基于kafka的信息推送方法。9. A computer device, comprising a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the kafka-based information push method according to any one of claims 1 to 7 is implemented.

10.一种计算机可读存储介质，其特征在于，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的基于kafka的信息推送方法。10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the kafka-based information push method according to any one of claims 1 to 7 is implemented.