CN110704741A

Movatterモバイル変換

Info

Publication number: CN110704741A
Application number: CN201910940088.8A
Authority: CN
Inventors: 王东京; 张新; 俞东进; 张剑清
Original assignee: Hangzhou Electronic Science and Technology University
Current assignee: Hangzhou Electronic Science and Technology University
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2020-01-17
Anticipated expiration: 2039-09-30
Also published as: CN110704741B

Abstract

The invention discloses an interest point prediction method based on a space-time point process, which comprises the following steps: s1 modeling with user sign-in sequence based on spatio-temporal context information integration of point process; s2 prediction of user interest based on a spatiotemporal process; s3 prediction of spatio-temporal context and sequence awareness. The invention extracts the behavior pattern and the interest of the user from the check-in sequence of the user by utilizing the process of the time-space point, predicts the context interest of the user by combining the time-space context, and finally comprehensively considers the general interest and the context interest of the user, thereby improving the prediction effect and improving the accuracy.

Description

Translated fromChinese

基于时空点过程的兴趣点预测方法Interest point prediction method based on spatiotemporal point process

技术领域technical field

本发明属于数据挖掘及推荐技术领域，具体涉及一种基于时空点过程的兴趣点预测方法。The invention belongs to the technical field of data mining and recommendation, and in particular relates to a point of interest prediction method based on a spatiotemporal point process.

背景技术Background technique

随着信息技术的发展，用户在享受便捷的信息和服务的同时，遭遇了信息过载问题，难以从海量在线数据中发现相关或者感兴趣的内容。推荐系统能够根据用户的历史记录主动挖掘用户潜在兴趣并帮助用户从海量在线数据中找到相关内容来满足用户需求，降低信息获取成本，而预测用户的行为是实现个性化推荐系统的关键之一。With the development of information technology, while enjoying convenient information and services, users encounter the problem of information overload, making it difficult to find relevant or interesting content from massive online data. Recommender systems can actively mine users' potential interests based on users' historical records and help users find relevant content from massive online data to meet user needs and reduce information acquisition costs. Predicting user behavior is one of the keys to implementing personalized recommendation systems.

然而，在兴趣点预测领域，传统方法通常无法充分利用用户的签到序列以及时间上下文和空间上下文信息，难以进一步提升准确率并满足用户的实时需求。因此，如何充分利用丰富的上下文信息序列信息，从中准确提取用户的长期兴趣和上下文动态兴趣并进行建模，是满足用户实时需求并提升预测推荐效果的关键之一。However, in the field of POI prediction, traditional methods usually cannot make full use of the user's check-in sequence and temporal context and spatial context information, and it is difficult to further improve the accuracy and meet the real-time needs of users. Therefore, how to make full use of the rich contextual information sequence information to accurately extract the user's long-term interests and contextual dynamic interests and model them is one of the keys to meet the real-time needs of users and improve the prediction and recommendation effect.

发明内容SUMMARY OF THE INVENTION

针对现有技术所存在的上述技术问题，本发明提供了一种基于时空点过程的兴趣点预测方法，能够改善预测和推荐的效果和性能。In view of the above technical problems existing in the prior art, the present invention provides a method for predicting interest points based on a spatiotemporal point process, which can improve the effect and performance of prediction and recommendation.

本发明包括如下步骤：The present invention comprises the following steps:

(1)收集所有用户的签到数据

每个用户的签到数据为用户对兴趣点(Point of Interest,POI)的签到序列其中p_i、t_i和c_i分别为POI、签到时间和上下文，c_i包括时间上下文向量

和空间上下文向量

时间上下文向量是POI的6维访问时间段向量(<上午，中午，下午，晚上，工作日，节假日>)，空间上下文向量是对应POI的2维地理位置向量(<经度，纬度>)，用户集合、POI集合和上下文集合分别表示为U、P和C。(1) Collect check-in data of all users

The check-in data of each user is the check-in sequence of the user's point of interest (POI). where p_i , t_i and_ci are POI, check-in time and context, respectively, and_ci includes the time context vector

and the spatial context vector

The time context vector is the 6-dimensional access period vector of POI (<am, noon, afternoon, evening, weekdays, holidays>), the spatial context vector is the 2-dimensional geographic location vector (<longitude, latitude>) corresponding to the POI, and the user Sets, POI sets, and context sets are denoted as U, P, and C, respectively.

(2)根据用户u_i对POI的签到序列

将用户u_i、历史签到序列{(p₁,t₁,c₁),(p₂,t₂,c₂),…,(p_m-1,t_m-1,c_m-1)}和目标POI签到记录(p_m,t_m,c_m)的条件密度函数建模为：(2) According to the check-in sequence of user_ui to POI

Set user u_i , historical check-in sequence {(p₁ ,t₁ ,c₁ ),(p₂ ,t₂ ,c₂ ),…,(p_m-1 ,t_m-1 ,c_m-1 )} and the conditional density function of the target POI check-in records (p_m , t_m ,_cm ) are modeled as:

其中：

是用户u_i的一般兴趣，

是用于表示时间衰减的指数函数，

是用于表示空间上下文相似度的函数，

是用于表示时间上下文相似度的函数，f(x)＝1(1+exp(-x))是Logistic函数，用于保证

的非负性。in:

is the general interest of user_ui ,

is an exponential function used to represent time decay,

is a function used to represent spatial context similarity,

is the function used to represent the similarity of the temporal context, f(x)=1(1+exp(-x)) is the Logistic function, which is used to ensure

of non-negativity.

上述指数函数

定义为：The exponential function above

defined as:

其中：α_u是与用户相关的参数，用来表示对于不同用户，历史签到行为h对目标POIp_m的影响程度是不同的。Among them: α_u is a parameter related to the user, which is used to indicate that for different users, the impact degree of the historical check-in behavior h on the target POIp_m is different.

上述空间上下文距离函数

定义为：The above spatial context distance function

defined as:

其中：β_u是与用户相关的参数，用来表示空间上下文之间的相似程度的计算方式是个性化的，

表示历史签到POI p_h的地点上下文向量

与目标POIp_m的地点上下文向量

之间的欧氏距离。Among them: β_u is a parameter related to the user, and the calculation method used to express the similarity between spatial contexts is personalized,

Location context vector representing historical check-in POI_ph

location context vector with target POIp_m

Euclidean distance between .

上述时间上下文相似度函数定义为：The above temporal context similarity function defined as:

其中：γ_u是与用户相关的参数，用来表示对于不同用户，时间上下文的影响程度是不同的，

表示历史签到POI p_h的时间上下文向量

与目标POI p_m的时间上下文向量

之间的欧氏距离。Among them: γ_u is a parameter related to the user, which is used to indicate that the influence degree of the time context is different for different users,

Temporal context vector representing historical check-in POI_ph

Temporal context vector with target POI p_m

Euclidean distance between .

(3)给定所有用户的POI签到序列数据

对数形式的目标函数可以定义为：(3) Given the POI check-in sequence data of all users

The objective function in logarithmic form can be defined as:

其中：是给定用户u_i在时间t之前的POI签到交互序列

用户u_i对POI p_j感兴趣的概率，定义为：in: is the POI check-in interaction sequence for a given user u_i before time t

The probability that user u_i is interested in POI p_j is defined as:

(4)对上述目标函数O进行最大化求解，以求得所有参数。(4) Maximize the above objective function O to obtain all parameters.

(5)根据用户历史签到记录，计算出用户对于P中每个POI的兴趣值。给定用户u_i的历史交互记录和时空上下文信息c^s和c^t，用户u_i对POI p_j的兴趣定义为：(5) Calculate the user's interest value for each POI in P according to the user's historical check-in records. Given user_ui 's historical interaction records and spatiotemporal context information c^s and^ct , user_ui 's interest in POI p_j is defined as:

其中：f(x)＝log(1+exp(x))是Logistic函数，用于保证概率值

的非负性，是用户u_i的一般兴趣，代表用户的上下文兴趣，t、c^s和c^t分别是当前的时间、时间上下文和空间上下文。Among them: f(x)=log(1+exp(x)) is the Logistic function, which is used to guarantee the probability value

the non-negativity of , is the general interest of user_ui , Representing the user's contextual interests,^t , cs, and^ct are the current temporal, temporal and spatial contexts, respectively.

(6)根据用户的兴趣值对数据库中所有POI从高到底排序，并提取兴趣值最高的若干个POI推荐给用户。排序公式如下：(6) Sort all POIs in the database from high to bottom according to the user's interest value, and extract several POIs with the highest interest value to recommend to the user. The sorting formula is as follows:

其中：u表示目标用户；p_i∈P和p_i′∈P是数据库中的POI。where: u represents the target user; pi_∈P and pi_′ ∈P are the POIs in the database.

本发明首次结合点过程模型集成时间和空间上下文信息，为解决上下文感知的行为建模与预测提供了一种可靠的方法；本发明根据用户的签到序列中的时空信息对用户的一般兴趣和上下文兴趣进行建模与预测，为用户的兴趣偏好的提取和建模困难提供了一种准确的方法；利用点过程模型集成时空上下文和序列信息，本发明能够提升预测和推荐的效果。The present invention integrates temporal and spatial context information with point process model for the first time, and provides a reliable method for solving context-aware behavior modeling and prediction; Interest modeling and prediction provides an accurate method for extracting and modeling difficulty of user's interest preference; using point process model to integrate spatiotemporal context and sequence information, the present invention can improve the effect of prediction and recommendation.

附图说明Description of drawings

图1为本发明的系统架构示意图。FIG. 1 is a schematic diagram of the system architecture of the present invention.

图2为本发明的用户偏好预测流程示意图。FIG. 2 is a schematic diagram of a user preference prediction process according to the present invention.

具体实施方式Detailed ways

为了更为具体地描述本发明，下面结合附图及具体实施方式对本发明的技术方案进行详细说明。In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

本发明基于时空点过程的兴趣点预测算法包括以下步骤：The interest point prediction algorithm based on the spatiotemporal point process of the present invention includes the following steps:

(1)收集所有用户的签到数据

每个用户的签到数据为用户对兴趣点(Point of Interest,POI)的签到序列

其中p_i、t_i和c_i分别为POI、签到时间和上下文，c_i包括时间上下文向量

和空间上下文向量时间上下文向量是POI的6维访问时间段向量(<上午，中午，下午，晚上，工作日，节假日>)，空间上下文向量是对应POI的2维地理位置向量(<经度，纬度>)，用户集合、POI集合和上下文集合分别表示为U、P和C。(1) Collect check-in data of all users

The check-in data of each user is the check-in sequence of the user's point of interest (POI).

where p_i , t_i and_ci are POI, check-in time and context, respectively, and_ci includes the time context vector

and the spatial context vector The time context vector is the 6-dimensional access period vector of POI (<am, noon, afternoon, evening, weekdays, holidays>), the spatial context vector is the 2-dimensional geographic location vector (<longitude, latitude>) corresponding to the POI, and the user Sets, POI sets, and context sets are denoted as U, P, and C, respectively.

(2)根据用户u_i对POI的签到序列

其中：

是用户u_i的一般兴趣，

是用于表示时间衰减的指数函数，

是用于表示空间上下文的相似度函数，

是用于表示时间上下文相似度的函数，f(x)＝1/(1+exp(-x))是Logistic函数，用于保证

的非负性。in:

is the general interest of user_ui ,

is an exponential function used to represent time decay,

is the similarity function used to represent the spatial context,

is the function used to represent the similarity of the temporal context, f(x)=1/(1+exp(-x)) is the Logistic function, which is used to ensure

of non-negativity.

上述指数函数

定义为：The above exponential function

defined as:

上述空间上下文距离函数

定义为：The above spatial context distance function

defined as:

表示历史签到POI p_h的地点上下文向量

与目标POIp_m的地点上下文向量

Location context vector representing historical check-in POI_ph

location context vector with target POIp_m

Euclidean distance between .

上述时间上下文相似度函数

定义为：The above temporal context similarity function

defined as:

其中：γ_u是与用户相关的参数，用来表示对于不同用户，时间上下文的影响程度是不同的，表示历史签到POI p_h的时间上下文向量

与目标POI p_m的时间上下文向量

之间的欧氏距离。Among them: γ_u is a parameter related to the user, which is used to indicate that the influence degree of the time context is different for different users, Temporal context vector representing historical check-in POI_ph

Temporal context vector with target POI p_m

Euclidean distance between .

(3)给定所有用户的POI签到序列数据对数形式的目标函数可以定义为：(3) Given the POI check-in sequence data of all users The objective function in logarithmic form can be defined as:

其中：

是给定用户u_i在时间t之前的POI签到交互序列

用户u_i对POI p_j感兴趣的概率，定义为：in:

is the POI check-in interaction sequence for a given user u_i before time t

The probability that user u_i is interested in POI p_j is defined as:

其中：f(x)＝log(1+exp(x))是Logistic函数，用于保证概率值

的非负性，

是用户u_i的一般兴趣，

代表用户的上下文兴趣，t、c^s和c^t分别是当前的时间、时间上下文和空间上下文。Among them: f(x)=log(1+exp(x)) is the Logistic function, which is used to guarantee the probability value

the non-negativity of ,

is the general interest of user_ui ,

Representing the user's contextual interests,^t , cs, and^ct are the current temporal, temporal and spatial contexts, respectively.

图1所示了本实施方式基于时空点过程的兴趣点预测方法的架构。该方法分为两个主要模块：预处理模块和预测模块。预处理模块中，首先获取所有用户的签到记录序列以及时空上下文信息；再利用点过程模型集成时空上下文信息并对用户的签到序列建模，得到基于时空点过程的兴趣模型。在预测模块中，首先从目标用户的POI签到数据中获取签到序列和上下文信息；然后利用基于时空点过程的兴趣模型推测用户的兴趣并预测用户接下来的签到行为。图2展示了用户偏好预测的详细步骤，其首先获取用户的历史签到数据和上下文信息，结合基于时空点过程的兴趣模型计算目标用户u对POI的偏好。FIG. 1 shows the architecture of the interest point prediction method based on the spatiotemporal point process in this embodiment. The method is divided into two main modules: a preprocessing module and a prediction module. In the preprocessing module, the check-in record sequence and spatio-temporal context information of all users are obtained first; the spatio-temporal context information is then integrated with the point process model and the user's check-in sequence is modeled to obtain an interest model based on the spatio-temporal point process. In the prediction module, the check-in sequence and context information are first obtained from the POI check-in data of the target user; then the interest model based on the spatiotemporal point process is used to infer the user's interest and predict the user's next check-in behavior. Figure 2 shows the detailed steps of user preference prediction, which first obtains the user's historical check-in data and contextual information, and calculates the target user u's POI preference based on the interest model based on the spatiotemporal point process.

上述的对实施例的描述是为便于本技术领域的普通技术人员能理解和应用本发明。熟悉本领域技术的人员显然可以容易地对上述实施做出各种修改，并把在此说明的一般原理应用到其他实施例中而不必经过创造性的劳动。因此，本发明不限于上述实施例，本领域技术人员根据本发明的揭示，对于本发明做出的改进和修改都应该在本发明的保护范围之内。The above description of the embodiments is for the convenience of those of ordinary skill in the art to understand and apply the present invention. It will be apparent to those skilled in the art that various modifications to the above-described implementations can be readily made, and the general principles described herein can be applied to other embodiments without inventive effort. Therefore, the present invention is not limited to the above-mentioned embodiments, and improvements and modifications made by those skilled in the art according to the disclosure of the present invention should all fall within the protection scope of the present invention.

Claims

1. The interest point prediction method based on the space-time point process is characterized by comprising the following steps:

step (1) collecting check-in data of all users

Check-in data of each user is check-in sequence of the user to POI (point of interest)Wherein p is_i、t_iAnd c_iAre respectively of interestPoint POI, check-in time and context, c_iIncluding temporal context vectors

And spatial context vector

The user set, POI set, and context set are denoted U, P and C, respectively;

step (2) according to the user u_iCheck-in sequence for point of interest POIUser u_iHistory check-in sequence { (p)₁,t₁,c₁),(p₂,t₂,c₂),…,(p_m-1,t_m-1,c_m-1) } and target Point of interest POI sign-in record (p)_m,t_m,c_m) The conditional density function of (a) is modeled as:

wherein:

is user u_iIn the general interest of (a) in (b),

is an exponential function for representing the time decay,

is a similarity function for representing the spatial context,is a function for representing the similarity of temporal contexts, and f (x) 1/(1+ exp (-x)) is a Logistic function for ensuring the similarity of temporal contexts

Is non-negative;

step (3) giving POI (Point of interest) check-in data of all users

The objective function in logarithmic form is defined as:

wherein:

is given user u_iPoint of interest POI check-in interaction sequence before time t

User u_iFor point of interest POI p_jA probability of interest;

step (4), carrying out maximum solution on the objective function O to obtain all parameters;

step (5), calculating the interest value of the user for each POI in the P according to the historical sign-in record of the user;

and (6) sequencing all the POIs in the database from top to bottom according to the interest values of the user, and extracting a plurality of POIs with the highest predicted interest values to recommend to the user.

2. The method of predicting points of interest based on space-time point process of claim 1, wherein: the exponential function of step (2)Is defined as:

wherein: alpha is alpha_uIs a parameter related to the user and is used for representing the historical sign-in behavior h to the target point of interest POI p for different users_mThe degree of influence of (c) is different.

3. The method of predicting points of interest based on space-time point process of claim 1, wherein: the spatial context distance function of step (2)

Is defined as:

wherein: beta is a_uIs a user-related parameter, the way in which the computation representing the degree of similarity between spatial contexts is personalized,

representing historical check-in points of interest POI p_hLocation context vector of

And a target point of interest (POIp)_mLocation context vector of

The euclidean distance between.

4. The method of predicting points of interest based on space-time point process of claim 1, wherein: the time context similarity function of step (2)

Is defined as:

wherein: gamma ray_uIs a user-related parameter that indicates that, for different users, the degree of influence of the temporal context is different,representing historical check-in points of interest POI p_hTemporal context vector of

With a target point of interest POI p_mTemporal context vector of

The euclidean distance between.

5. The method of predicting points of interest based on space-time point process of claim 1, wherein: step (3) the given user u_iPoint of interest POI check-in interaction sequence before time t

User u_iFor point of interest POI p_jProbability of interest

Is defined as:

6. the method of predicting points of interest based on space-time point process of claim 1, wherein: giving user u in step (5)_iHistorical interaction records and spatiotemporal context information c^sAnd c^tUser u_iFor point of interest POI p_jThe interest of (2) is defined as:

wherein: (x) log (1+ exp (x)) is a Logistic function for guaranteeing probability values

Is not negative in the sense of (1),is user u_iIn the general interest of (a) in (b),

representing the contextual interest of the user, t, c^sAnd c^tCurrent temporal, temporal context and spatial context, respectively.

7. The method of predicting points of interest based on space-time point process of claim 1, wherein: the sequence in the step (6) is calculated by adopting the following formula:

wherein: u represents a target user; p is a radical of_iE.g. P and P_i′E P is the point of interest POI in the database.