CN105138653B

Movatterモバイル変換

Info

Publication number: CN105138653B
Application number: CN201510540419.0A
Authority: CN
Inventors: 于瑞国; 刘志强; 王建荣; 喻梅; 于健; 赵满坤
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2015-08-28
Filing date: 2015-08-28
Publication date: 2018-08-21
Anticipated expiration: 2035-08-28
Also published as: CN105138653A

Abstract

Translated fromChinese

本发明公开了一种基于典型度和难度的题目推荐方法，方法包括：计算每个目标用户在每一类型题目上的做题情况，包括：题目的数量、难度以及目标用户在此类型题目上的通过率；根据目标用户特征向量，计算任意目标用户特征向量之间的相似性，并对每个目标用户，选择若干相似度最高的用户作为最近邻；依据最近邻用户的做题情况，预测目标用户对未做题目的评分；对目标用户，选择若干评分最高的目标题目作为目标用户的推荐结果。装置包括：计算模块、第一选择模块、评分模块和第二选择模块，本发明实现利用题目具有“难度”的特点来改进传统推荐方法，获得了较好的效果。本发明将来可以应用到学习网站中，帮助用户选择学习内容，制定个性化学习方案。

The invention discloses a topic recommendation method based on typicality and difficulty. The method includes: calculating the situation of each target user on each type of topic, including: the number of topics, the difficulty and the target user's performance on this type of topic pass rate; according to the target user feature vector, calculate the similarity between any target user feature vectors, and for each target user, select several users with the highest similarity as the nearest neighbor; The target user's score on the unfinished topic; for the target user, select a number of target topics with the highest scores as the target user's recommendation results. The device includes: a calculation module, a first selection module, a scoring module and a second selection module. The present invention uses the characteristic of "difficulty" in the topic to improve the traditional recommendation method and obtains better results. The present invention can be applied to learning websites in the future to help users select learning content and formulate individualized learning programs.

Description

Translated fromChinese

技术领域technical field

本发明涉及数据挖掘、机器学习和信息检索领域，涉及协同过滤的推荐领域，尤其涉及一种基于典型度和难度的题目推荐方法及其推荐装置。The invention relates to the fields of data mining, machine learning and information retrieval, and to the field of collaborative filtering recommendation, in particular to a topic recommendation method based on typicality and difficulty and a recommendation device thereof.

背景技术Background technique

目前，已经有比较成熟的推荐系统和推荐算法，当前主流的推荐算法主要分为协同过滤推荐(collaborative filtering，CF)、基于内容的推荐(content based，CB)和混合推荐方法(hybrid methods)。混合推荐算法即综合前两者，达到更好的效果。At present, there are relatively mature recommendation systems and recommendation algorithms. The current mainstream recommendation algorithms are mainly divided into collaborative filtering recommendation (collaborative filtering, CF), content based recommendation (content based, CB) and hybrid recommendation methods (hybrid methods). The hybrid recommendation algorithm combines the former two to achieve better results.

在基于内容的推荐系统中，物品可以被描述为一系列属性值的向量，而描述物品特征的属性值被称为“内容”。基于内容的推荐系统就是根据用户的历史评分行为，发现用户偏好，并推荐与其偏好相近似的物品。这种方法主要用于推荐能以文本描述的物品，比如：文献资料、新闻等等。In content-based recommender systems, items can be described as a series of vectors of attribute values, and the attribute values describing the characteristics of items are called "content". The content-based recommendation system is based on the user's historical rating behavior, discovers user preferences, and recommends items similar to their preferences. This method is mainly used to recommend items that can be described in text, such as literature, news, and so on.

协同过滤算法通过用户之间的相似性或物品之间的相似性来预测用户对未知物品的评分。主要依据用户之间相似性的被称为基于用户的最近邻推荐；主要依据物品间相似度的被称为基于物品的最近邻推荐。协同过滤推荐方法成立的前提是假设用户的兴趣爱好长期不变。Collaborative filtering algorithms predict users' ratings for unknown items through the similarity between users or the similarity between items. The user-based nearest neighbor recommendation is mainly based on the similarity between users; the item-based nearest neighbor recommendation is mainly based on the similarity between items. The premise of the collaborative filtering recommendation method is to assume that the user's interests and hobbies remain unchanged for a long time.

不论是基于内容的推荐系统，还是传统的协同过滤推荐系统，将其应用到知识推荐领域时都有其不足。基于内容的推荐系统要能够准确地描述所推荐的物品的特征，并将其与用户偏好对应起来，当面临类似题目这样特征较为模糊的数据时，就难以准确描述，以致使推荐不准确。传统协同过滤方法不需要准确描述物品，却需要大量的评分数据，当将其应用到推荐知识、推荐题目这一方面时，很难有足够的评分数据。Whether it is a content-based recommendation system or a traditional collaborative filtering recommendation system, it has its shortcomings when it is applied to the field of knowledge recommendation. A content-based recommendation system must be able to accurately describe the characteristics of the recommended items and match them with user preferences. When faced with data with ambiguous characteristics such as topics, it is difficult to accurately describe them, resulting in inaccurate recommendations. The traditional collaborative filtering method does not need to accurately describe items, but requires a large amount of scoring data. When it is applied to recommending knowledge and recommending topics, it is difficult to have enough scoring data.

发明内容Contents of the invention

本发明提供了一种基于典型度和难度的题目推荐方法及其推荐装置，本发明能够有效克服传统推荐技术在应用到知识推荐上时，物品特征难以描述、评分信息少、且未充分考虑题目难度这一重要特征的技术性问题，详见下文描述：The present invention provides a topic recommendation method based on typicality and difficulty and its recommendation device. The present invention can effectively overcome the difficulty in describing the characteristics of items, less scoring information, and insufficient consideration of topics when traditional recommendation techniques are applied to knowledge recommendation. The technical aspects of this important feature of difficulty are described in detail below:

一种基于典型度和难度的题目推荐方法，所述题目推荐方法包括以下步骤：A method for recommending topics based on typicality and difficulty, the method for recommending topics includes the following steps:

计算每个目标用户在每一类型题目上的做题情况，包括：题目的数量、难度以及目标用户在此类型题目上的通过率；Calculate each target user's performance on each type of topic, including: the number of questions, difficulty, and the pass rate of the target user on this type of topic;

根据目标用户特征向量，计算任意目标用户特征向量之间的相似性，并对每个目标用户，选择若干相似度最高的用户作为最近邻；According to the target user feature vector, calculate the similarity between any target user feature vectors, and for each target user, select several users with the highest similarity as the nearest neighbor;

依据最近邻用户的做题情况，预测目标用户对未做题目的评分；According to the test situation of the nearest neighbor user, predict the score of the target user on the untested question;

对目标用户，选择若干评分最高的目标题目作为目标用户的推荐结果。For the target user, select several target topics with the highest scores as the recommendation results for the target user.

其中，所述题目推荐方法还包括：Wherein, the topic recommendation method also includes:

排除错误或无效的数据，然后按提交时间从小到大排序；Exclude wrong or invalid data, and then sort by submission time from small to large;

统计数据涉及的用户与题目，形成一个用户集合和一个题目集合。The users and topics involved in the statistical data form a user set and a topic set.

其中，所述目标用户特征向量具体为：Wherein, the target user feature vector is specifically:

＜type₁:typicality₁,type₂:typicality₂,…,type_i:typicality_i…,type_n:typicality_n＞＜type₁ :typicality₁ ,type₂ :typicality₂ ,…,type_i :typicality_i …,type_n :typicality_n ＞

其中，type_i代表题目类型；typicality_i是目标用户在type_i类型题目上的典型度；i为题目类型编号；n为题目类型总数。Among them, type_i represents the type of topic; typicality_i is the typical degree of the target user on the topic of type_i ; i is the number of the topic type; n is the total number of topic types.

一种基于典型度和难度的题目推荐装置，所述题目推荐装置包括：A device for recommending topics based on typicality and difficulty, the device for recommending topics includes:

计算模块，用于计算每个目标用户在每一类型题目上的做题情况，包括：题目的数量、难度以及目标用户在此类型题目上的通过率；Calculation module, used to calculate the situation of each target user on each type of question, including: the number of questions, the difficulty and the pass rate of the target user on this type of question;

第一选择模块，用于根据目标用户特征向量，计算任意目标用户特征向量之间的相似性，并对每个目标用户，选择若干相似度最高的用户作为最近邻；The first selection module is used to calculate the similarity between any target user feature vectors according to the target user feature vector, and for each target user, select several users with the highest similarity as the nearest neighbor;

评分模块，用于依据最近邻用户的做题情况，预测目标用户对未做题目的评分；The scoring module is used to predict the score of the target user on the unfinished question according to the test situation of the nearest neighbor user;

第二选择模块，用于对目标用户，选择若干评分最高的目标题目作为目标用户的推荐结果。The second selection module is used to select a number of target topics with the highest scores for the target user as the recommendation results for the target user.

其中，所述题目推荐装置还包括：Wherein, the topic recommendation device also includes:

预处理模块，用于排除错误或无效的数据，然后按提交时间从小到大排序；统计数据涉及的用户与题目，形成一个用户集合和一个题目集合。The preprocessing module is used to eliminate wrong or invalid data, and then sort by submission time from small to large; the users and topics involved in the statistical data form a user set and a topic set.

本发明提供的技术方案的有益效果是：The beneficial effects of the technical solution provided by the invention are:

1、本发明为推荐系统在用户学习过程中的应用提供了新思路，在传统的用户特征表示中引入做题难度来代表其能力水平，在一定程度上改善了传统推荐方法在题目推荐上的效果。1. The present invention provides a new idea for the application of the recommendation system in the user learning process. The difficulty of doing questions is introduced into the traditional user feature representation to represent their ability level, which improves the performance of the traditional recommendation method in topic recommendation to a certain extent. Effect.

2、跟基于用户的协调过滤方法相比，本发明能更好的描述用户特征，得到更准确的题目推荐结果，能帮助用户选择学习内容，提高学习效率。2. Compared with the user-based coordinated filtering method, the present invention can better describe user characteristics, obtain more accurate topic recommendation results, help users select learning content, and improve learning efficiency.

附图说明Description of drawings

图1为一种基于典型度和难度的题目推荐方法的流程图；Fig. 1 is a flow chart of a topic recommendation method based on typicality and difficulty;

图2为难度系数引入比例的影响示意图；Figure 2 is a schematic diagram of the influence of the difficulty coefficient introduction ratio;

图3为距离相似度的比较示意图；Fig. 3 is a comparative schematic diagram of distance similarity;

图4为余弦相似度的比较示意图；Fig. 4 is a comparative schematic diagram of cosine similarity;

图5为一种基于典型度和难度的题目推荐装置的结构示意图；Fig. 5 is a schematic structural diagram of a topic recommendation device based on typicality and difficulty;

图6为一种基于典型度和难度的题目推荐装置的另一结构示意图。Fig. 6 is another structural schematic diagram of a device for recommending topics based on typicality and difficulty.

附图中，各标号所代表的部件列表如下：In the accompanying drawings, the list of parts represented by each label is as follows:

1：计算模块； 2：第一选择模块；1: calculation module; 2: first selection module;

3：评分模块； 4：第二选择模块；3: scoring module; 4: second choice module;

5：预处理模块。5: Preprocessing module.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面对本发明实施方式作进一步地详细描述。为了方便描述，本发明实施例中称要被推荐题目的用户为目标用户，称要被推荐的一道题目为目标题目。In order to make the purpose, technical solution and advantages of the present invention clearer, the implementation manners of the present invention will be further described in detail below. For the convenience of description, in the embodiment of the present invention, the user whose topic is to be recommended is called the target user, and a topic to be recommended is called the target topic.

实施例1Example 1

本发明实施例提供了一种基于典型度和难度的题目推荐方法，参见图1，该题目推荐方法包括以下步骤：An embodiment of the present invention provides a topic recommendation method based on typicality and difficulty, as shown in FIG. 1 , the topic recommendation method includes the following steps:

101：预处理数据；101: preprocessing data;

本发明实施例处理的初始数据是用户做题的提交记录。首先排除错误或无效(例如：缺失关键数据项)的数据，然后按提交时间从小到大排序。完成上述步骤后，统计数据涉及的用户与题目，形成一个用户集合和一个题目集合。The initial data processed by the embodiment of the present invention is the submission record of the user's quiz. First exclude wrong or invalid (for example: missing key data items) data, and then sort by submission time from small to large. After the above steps are completed, the users and topics involved in the statistical data form a user set and a topic set.

102：对题目集合进行分类，计算题目的难度，以生成题目描述；102: Classify the set of questions, calculate the difficulty of the questions, and generate a description of the questions;

在本发明实施例中，使用题目的类型标签(如“动态规划”、“计算几何”，“组合数学”……)对题目集合进行分类。此外，还要计算每道题目的难度。在本发明实施例中，题目的难度被描述为被提交次数和通过率两个因素的函数，而这两个因素均可通过对步骤101得到的数据统计得出。当题目的难度和类型已知后，将每道题目描述成一个二元组：<类型，难度>。In the embodiment of the present invention, the topic set is classified by using the type label of the topic (such as "dynamic programming", "computational geometry", "combinatorics"...). In addition, the difficulty of each question is calculated. In the embodiment of the present invention, the difficulty of a question is described as a function of two factors, the number of submissions and the passing rate, and these two factors can be obtained by counting the data obtained in step 101 . When the difficulty and type of the topic are known, describe each topic as a 2-tuple: <type, difficulty>.

103：计算每个目标用户在每一类型题目上的做题情况，即所做此类型题目的数量，难度以及该用户在此类型题目上的通过率；103: Calculate the situation of each target user on each type of question, that is, the number of questions of this type, the difficulty and the pass rate of the user on this type of question;

根据上述做题情况，计算该目标用户在此类型题目上的典型度；目标用户在每一类型题目上的典型度形成该目标用户的特征描述(即特征向量)：According to the above-mentioned problem-solving situation, calculate the typical degree of the target user on this type of topic; the typical degree of the target user on each type of topic forms the characteristic description (ie, feature vector) of the target user:

其中，type_i代表题目类型，typicality_i是该目标用户在type_i类型题目上的典型度，i为题目类型编号；n为题目类型总数。Among them, type_i represents the type of topic, typicality_i is the typical degree of the target user on the topic of type_i , i is the number of the topic type, and n is the total number of topic types.

104：根据目标用户特征向量，计算任意目标用户特征向量之间的相似性，并对每个目标用户，选择若干相似度最高的用户作为其“最近邻”；104: According to the target user feature vector, calculate the similarity between any target user feature vectors, and for each target user, select several users with the highest similarity as its "nearest neighbor";

首先利用步骤103形成的用户特征向量，计算任意两个目标用户特征向量之间的相似度，以此作为两目标用户间的相似度；然后对每个目标用户，选择若干相似度最高的用户作为其“最近邻”。此处的“最近邻”，即传统协同过滤方法中所描述的最近邻，将作为预测目标用户对目标题目评分的主要依据。First, use the user feature vector formed in step 103 to calculate the similarity between any two target user feature vectors as the similarity between the two target users; then for each target user, select some users with the highest similarity as its "nearest neighbor". The "nearest neighbor" here, that is, the nearest neighbor described in the traditional collaborative filtering method, will be used as the main basis for predicting the target user's score on the target topic.

105：依据最近邻用户的做题情况，预测目标用户对未做题目的评分。105: Predict the score of the target user on the untested questions according to the questions done by the nearest neighbor users.

在本发明实施例中，针对目标题目和目标用户，计算做了此目标题目的最近邻用户数目，将其归一化到[0，1]后作为用户对此目标题目的预测评分。In the embodiment of the present invention, for the target topic and the target user, the number of nearest neighbor users who have done the target topic is calculated, and normalized to [0, 1] as the user's predicted score for the target topic.

106：对目标用户，选择若干评分最高的目标题目作为对该目标用户的推荐结果。106: For the target user, select several target topics with the highest scores as the recommendation results for the target user.

该步骤106依据步骤105和步骤103。目标用户做题的频度越高，对其推荐题目的数量越多，用户做题的频度由步骤103获得。确定了推荐数目后，根据步骤105产生的预测评分，选择相应数量最高分的题目作为推荐结果。This step 106 follows steps 105 and 103 . The higher the frequency of the target user doing the questions, the more the number of recommended questions is, and the frequency of the user doing the questions is obtained in step 103 . After the number of recommendations is determined, according to the predicted scores generated in step 105, the corresponding number of topics with the highest scores is selected as the recommendation result.

本发明以实验数据中用户最后的提交时间为界，获取之后一段时间内目标用户的做题情况，通过比较推荐结果跟实际做题情况，评价推荐的准确性。In the present invention, the user's last submission time in the experimental data is used as the boundary, and the target user's practice situation within a period of time after acquisition, and the accuracy of the recommendation is evaluated by comparing the recommendation result with the actual practice situation.

综上所述，本发明实施例通过步骤101-步骤106充分考虑了题目难度这一重要特征，改进了传统推荐方法，获得了较好的效果。To sum up, the embodiment of the present invention fully considers the important feature of the difficulty of the topic through steps 101 to 106, improves the traditional recommendation method, and obtains better results.

实施例2Example 2

下面结合具体的计算公式、例子对实施例1中技术方案进行详细描述，详见下文：The technical scheme in embodiment 1 is described in detail below in conjunction with specific calculation formulas and examples, see below for details:

201：预处理数据；201: preprocessing data;

即，排除一些不合法数据，例如：一些缺失了关键数据项的数据；另外，统计形成初始的用户集合和题目集合，以明确本发明实施例处理的数据范围。That is, exclude some illegal data, for example: some data missing key data items; in addition, statistically form the initial user set and topic set, so as to clarify the range of data processed by the embodiment of the present invention.

202：对题目集合进行分类，计算题目的难度；202: Classify the set of questions and calculate the difficulty of the questions;

通常，被越多人做的题目越简单，通过率越高的题越简单，此两者比较，前者更重要，因为有些题目通过率很高，但只有个别人提交，这种题目往往并不简单。除此之外，不同类别的题目之间有差异，例如：计算几何类的题目，由于解题的代码量大，细节问题多，导致此类题目整体的通过率不高。综合以上这些实际情况，题目的难度系数通过公式(1)计算。Usually, the questions that are done by more people are simpler, and the questions with a higher pass rate are simpler. Comparing the two, the former is more important, because some questions have a high pass rate, but only a few people submit them. Simple. In addition, there are differences between different types of questions. For example, for computational geometry questions, due to the large amount of code and many detailed problems in solving the questions, the overall pass rate of such questions is not high. Based on the above actual conditions, the difficulty coefficient of the topic is calculated by formula (1).

其中，j为题目p_i所属的题目类别编号；mxSub_j是j类题目中，被提交次数最多的题目的提交次数；mxAc_j是j类题目中，通过率最高的题目的通过率。subCnt_i和acRate_i分别代表题目p_i的被提交次数和通过率。按上式计算出来的题目难度系数属于[0,1]。Among them, j is the number of the topic category to which topic p_i belongs; mxSub_j is the number of submissions of the topic that has been submitted the most times among the topics of category j; mxAc_j is the pass rate of the topic with the highest pass rate among topics of category j. subCnt_i and acRate_i respectively represent the number of submissions and pass rate of topic p_i . The difficulty coefficient of the topic calculated according to the above formula belongs to [0,1].

203：计算用户特征向量；203: Calculate the user feature vector;

用户特征向量即用户在每类题目上的典型度。The user feature vector is the typical degree of the user on each type of topic.

依据TyCo的计算方法(Cai,Yi,Leung,Ho-fung,Li Q,et al.TyCo:TowardsTypicality-based Collaborative Filtering Recommendation[C]//201022ndInternational Conference on Tools with Artificial IntelligenceIEEE ComputerSociety,2010:97-104)，用户在一类题目对应用户组中的典型度，取决于两个因素，一个是用户对此类型题目的平均评分，另一个是用户做这类题目的频繁程度。用户对此类题目评分越高，则对这类题目越感兴趣；用户做此类题目越频繁，则对此类题目越感兴趣。按照以上两个因素分别计算出两个值，用户在此类题目对应用户组中的典型度将是这两个值的平均值。用户u_i在第j类题目对应用户组中的典型度v_i，j计算公式如式2所示。According to the calculation method of TyCo (Cai, Yi, Leung, Ho-fung, Li Q, et al. TyCo: Towards Typicality-based Collaborative Filtering Recommendation[C]//201022nd International Conference on Tools with Artificial IntelligenceIEEE Computer Society, 2010:97-104) , the typical degree of a user in a user group corresponding to a type of topic depends on two factors, one is the average score of the user on this type of topic, and the other is the frequency of the user doing this type of topic. The higher the user's score for this type of topic, the more interested in this type of topic; the more frequently the user does this type of topic, the more interested in this type of topic. Two values are calculated respectively according to the above two factors, and the typical degree of the user in the user group corresponding to this type of topic will be the average value of these two values. The formula for calculating the typical degree v_i,j of the user u_i corresponding to the jth type of topic in the user group is shown in Equation 2.

其中，是用户u_i对所做第j类题目的平均评分；S_i,j是用户u_i做j类题目的数量；S_i,k为用户u_i做k类题目的数量；n为题目类型总数。in, is the average score of user u_i on the j-th category of questions; S_i,j is the number of user u_i ’s j-type questions; S_i,k is the number of user u_i ’s k-type questions; n is the total number of question types .

事实上，在该应用场景中，用户对每道题目只有通过与未通过两种结果，通过评分为1，否则为0；因此，用户对每类题目的平均评分R_i,j都是1，并不能代表用户的偏好或是能力水平。In fact, in this application scenario, the user has only two results of passing or failing each question, and the passing score is 1, otherwise it is 0; therefore, the average score R_i,j of each type of question is 1, It does not represent user preference or ability level.

TyCo的应用场景主要是预测电影评分，而本发明实施例提出的(DF_TyCo)方法主要用于题目推荐。难度是题目所特有的一个重要特征，用户所做题目的难度，在一定程度上代表了用户的能力水平。本发明实施例中用户典型度计算公式如式3所示。The application scenario of TyCo is mainly to predict movie ratings, and the (DF_TyCo) method proposed in the embodiment of the present invention is mainly used for topic recommendation. Difficulty is an important characteristic unique to the question, and the difficulty of the question made by the user represents the user's ability level to a certain extent. The formula for calculating the typical user degree in the embodiment of the present invention is shown in Equation 3.

其中，n是题目类别总数；D_i,j为用户u_i做第j类题目的题目难度总和；S_i,j为用户u_i做第j类题目的数量；β为待定系数，通过实验确定最优值；S_i,k为用户u_i做第k类题目的数量；D_i,k为用户u_i做第k类题目的题目难度总和。Among them, n is the total number of topic categories; D_{i, j} is the sum of the difficulty of the user u_i to do the jth type of topic; S_{i, j} is the number of user u_i to do the jth type of topic; β is an undetermined coefficient, determined through experiments Optimal value; S_{i, k} is the number of k-th type of questions for user u_i ; D_{i, k} is the sum of the difficulty of user u_i for k-th type of questions.

公式2与公式3相比，去掉了平均评分因素，增加了题目难度因素，是DF_TyCo相对于TyCo在题目推荐应用方面的改进。Compared with formula 3, formula 2 removes the average score factor and increases the item difficulty factor, which is an improvement of DF_TyCo in terms of topic recommendation application compared with TyCo.

204：计算用户相似度；204: Calculate user similarity;

该步骤利用用户特征向量来计算任意两用户之间的相似度。本发明实施例分别采用了余弦相似度公式和距离相似度公式，并通过实验对比其效果。相似度计算公式中的两个输入向量分别是两个用户的特征向量，即：和其中，v_i,k为用户u_i在第k类题目上的典型度；v_j,k为用户u_j在第k类题目上的典型度。输入向量确定后，相似度的具体计算过程为本领域技术人员所公知，本发明实施例对此不做赘述。This step uses user feature vectors to calculate the similarity between any two users. The embodiment of the present invention uses the cosine similarity formula and the distance similarity formula respectively, and compares their effects through experiments. The two input vectors in the similarity calculation formula are the feature vectors of the two users, namely: and Among them, v_{i, k} is the typical degree of user u_i on the k-th type of topic; v_{j, k} is the typical degree of user u_j on the k-th type of topic. After the input vector is determined, the specific calculation process of the similarity is well known to those skilled in the art, and will not be described in detail in this embodiment of the present invention.

205：选择最近邻；205: Select the nearest neighbor;

求出了用户之间的相似度之后，为每个用户选取与其最相似的若干用户形成该用户的“最近邻”。本发明实施例为每个用户选取固定数量的相似用户作为其最近邻。最近邻用户过多或过少都会影响推荐的效果，本发明实施例中通过多次对比实验选择一个最优值，本发明实施例对最近邻的个数不做限制。After calculating the similarity between users, select several users who are most similar to each user to form the user's "nearest neighbor". In the embodiment of the present invention, a fixed number of similar users are selected for each user as its nearest neighbors. Too many or too few nearest neighbor users will affect the recommendation effect. In the embodiment of the present invention, an optimal value is selected through multiple comparison experiments. The embodiment of the present invention does not limit the number of nearest neighbors.

206：预测用户对题目评分；206: Predict the user's rating on the topic;

目标用户对目标题目的评分根据其最近邻来计算。目标用户的最近邻用户做过的题目，目标用户也可能会做，并且越相似的用户，其做题情况越有参考价值。本发明实施例认为，任意用户对其提交并通过的题目评分为1，其余的题目评分为0，然后以最近邻用户和目标用户之间的相似度为权值，做加权平均，得到的结果便是目标用户对目标题目的预测评分。The target user's rating on the target item is calculated according to its nearest neighbors. The target user may also do the questions that the nearest neighbor user of the target user has done, and the more similar the user is, the more reference value the problem solving situation has. According to the embodiment of the present invention, any user scores 1 for the questions submitted and passed, and 0 for the rest of the questions, and then takes the similarity between the nearest neighbor user and the target user as the weight, and performs a weighted average to obtain the result It is the predicted score of the target user on the target topic.

207：选择推荐题目。207: Select a recommended topic.

此处有两种方法，一种是选择一个阈值，将目标用户评分超过阈值的题目作为推荐结果，另一种是为每个目标用户推荐固定数量的题目。本发明实施例中，采用为每个目标用户推荐固定数量的题目为例进行说明。There are two methods here, one is to choose a threshold, and the topic with the target user's score exceeding the threshold is used as the recommendation result, and the other is to recommend a fixed number of topics for each target user. In the embodiment of the present invention, a fixed number of topics are recommended for each target user as an example for illustration.

将实验数据按照时间，划分为测试集和训练集，利用训练集中的数据产生推荐结果，与测试集中的数据对比，衡量推荐的准确性。Divide the experimental data into a test set and a training set according to time, use the data in the training set to generate recommendation results, and compare them with the data in the test set to measure the accuracy of the recommendation.

综上所述，本发明实施例通过上述步骤201-步骤207利用题目具有“难度”的特点来改进传统推荐方法，获得了较好的效果。本发明实施例将来可以应用到学习网站中，帮助用户选择学习内容，制定个性化的学习方案。To sum up, the embodiment of the present invention improves the traditional recommendation method by taking advantage of the "difficulty" feature of the topic through the above steps 201 to 207, and obtains better results. The embodiment of the present invention can be applied to learning websites in the future to help users select learning content and formulate personalized learning programs.

实施例3Example 3

下面结合具体的实例、计算公式、附图对实施例1、2中的技术方案进行可行性验证，详见下文描述：Below in conjunction with specific example, calculation formula, accompanying drawing, the technical scheme in embodiment 1, 2 is carried out feasibility verification, see the following description for details:

实验中，让最近邻用户的数目K的值分别为5、10，…，70，观察实验结果，以探究K值对实验结果的影响；在一定的K值下，改变用户做题难度在描述用户特征时的权重因子，观察实验结果并计算其准确率。In the experiment, let the value of the number K of the nearest neighbor users be 5, 10, ..., 70 respectively, and observe the experimental results to explore the influence of the K value on the experimental results; under a certain K value, changing the user's difficulty in doing the questions is described in The weight factor of user characteristics, observe the experimental results and calculate its accuracy.

本发明采用F值(F-measure)和用户准确率(PU)来评价实验结果。为了描述方便，称“给用户u_x推荐一道题目p_y”为一次推荐。The present invention uses F-measure (F-measure) and user accuracy rate (PU) to evaluate the experimental results. For the convenience of description, "recommend a topic p_y to user u_x " is called a recommendation.

若产生的推荐总数为recomNum，准确推荐的数目为accuate，测试集中所有用户做题总数为realSum，则准确度pred和召回率recall计算方法分别如公式(4)、公式(5)所示。If the total number of recommendations generated is recomNum, the number of accurate recommendations is accuate, and the total number of questions made by all users in the test set is realSum, then the calculation methods of accuracy pred and recall rate recall are shown in formula (4) and formula (5) respectively.

有了pred和recall就可以计算F值，F值的计算方法如公式(6)所示。With pred and recall, the F value can be calculated, and the calculation method of the F value is shown in formula (6).

其中，pred和recall分别从两个方面反映了结果的好坏，F值是二者的综合体现，实验中，F值越大，结果越好。Among them, pred and recall respectively reflect the quality of the results from two aspects, and the F value is a comprehensive reflection of the two. In the experiment, the larger the F value, the better the result.

用户准确率是指能被准确推荐的目标用户占所有目标用户的百分比。给每个目标用户固定推荐10道题目，若对于某个目标用户来说，至少有一道题目是推荐准确的，则称该目标用户被准确推荐了。用户准确率计算公式如公式(7)所示：User accuracy refers to the percentage of target users who can be accurately recommended to all target users. 10 topics are fixedly recommended for each target user. If at least one topic is recommended correctly for a certain target user, it is said that the target user has been accurately recommended. The calculation formula of user accuracy rate is shown in formula (7):

其中，predUser为被准确推荐的用户数，allUser为被推荐过的用户总数。Among them, predUser is the number of users who are accurately recommended, and allUser is the total number of users who have been recommended.

如图2所示，用户特征被描述为在每一类题目上的典型度。引入难度系数，即以用户所做题目的难度为用户特征的一部分。从图2可以看出，引入难度能够改善实验结果，说明在推荐中，引入题目难度，是合理的。实验结果表明，随着权重因子从0逐渐增大至1.0，用户准确率呈现先升后降的趋势，尤其当权重因子在0.85左右，即用户所做题难度占用户特征的85％、其余因素占15％时效果最好，从而证明在此应用场景下，引入题目难度，能够改善传统推荐的效果。As shown in Figure 2, user characteristics are described as the degree of typicality on each type of topic. The difficulty coefficient is introduced, that is, the difficulty of the questions made by the user is part of the user characteristics. It can be seen from Figure 2 that the introduction of difficulty can improve the experimental results, which shows that it is reasonable to introduce the difficulty of the topic in the recommendation. The experimental results show that as the weight factor gradually increases from 0 to 1.0, the accuracy rate of the user first increases and then decreases, especially when the weight factor is around 0.85, that is, the difficulty of the questions made by the user accounts for 85% of the user characteristics, and the other factors The effect is the best when it accounts for 15%, which proves that in this application scenario, the introduction of difficulty can improve the effect of traditional recommendation.

TyCo是基于典型度的协同过滤方法，将其引入到推荐系统中，能够改善传统推荐方法的实验结果。本发明实施例中，在此基础上，引入题目难度，进一步改善了将推荐系统应用到题目推荐中时的实验结果。TyCo is a collaborative filtering method based on typicality, and introducing it into the recommendation system can improve the experimental results of traditional recommendation methods. In the embodiment of the present invention, on this basis, the difficulty of the topic is introduced to further improve the experimental results when the recommendation system is applied to topic recommendation.

事实上，跟UBCF方法相比，TyCo引入了典型度的概念，改善了传统的协同过滤方法，而本发明实施例在TyCo的基础上，引入了题目难度，再一次改善了结果。图3和图4分别展示了在不同的相似度计算公式下，本发明实施例和基于典型度的协同过滤方法(TyCo)、基于用户的协同过滤方法(UBCF)的比较，从图中能够看出，在K取各个值的时候，本方法()都能较其他方法取得更好的结果。证明本方法在准确度上要优于另外两者。In fact, compared with the UBCF method, TyCo introduces the concept of typicality and improves the traditional collaborative filtering method. On the basis of TyCo, the embodiment of the present invention introduces the difficulty of the topic, which improves the result again. Fig. 3 and Fig. 4 show respectively under different similarity calculation formulas, the comparison of the embodiment of the present invention and the collaborative filtering method (TyCo) based on the typical degree, and the collaborative filtering method (UBCF) based on the user, as can be seen from the figure It is shown that this method () can achieve better results than other methods when K takes various values. It is proved that this method is superior to the other two in accuracy.

实施例4Example 4

一种基于典型度和难度的题目推荐装置，参见图5，该题目推荐装置包括：A device for recommending topics based on typicality and difficulty, see Figure 5, the device for recommending topics includes:

计算模块1，用于计算每个目标用户在每一类型题目上的做题情况，包括：题目的数量、难度以及目标用户在此类型题目上的通过率；Calculation module 1, used to calculate the situation of each target user on each type of topic, including: the number of questions, difficulty and the pass rate of the target user on this type of topic;

第一选择模块2，用于根据目标用户特征向量，计算任意目标用户特征向量之间的相似性，并对每个目标用户，选择若干相似度最高的用户作为最近邻；The first selection module 2 is used to calculate the similarity between any target user feature vectors according to the target user feature vector, and for each target user, select several users with the highest similarity as the nearest neighbor;

评分模块3，用于依据最近邻用户的做题情况，预测目标用户对未做题目的评分；Scoring module 3, used to predict the score of the target user on the unfinished question according to the test situation of the nearest neighbor user;

第二选择模块4，用于对目标用户，选择若干评分最高的目标题目作为目标用户的推荐结果。The second selection module 4 is used for selecting a number of target topics with the highest scores for the target user as the recommendation results for the target user.

其中，参见图6，该题目推荐装置还包括：Wherein, referring to FIG. 6, the topic recommendation device also includes:

预处理模块5，用于排除错误或无效的数据，然后按提交时间从小到大排序；统计数据涉及的用户与题目，形成一个用户集合和一个题目集合。The preprocessing module 5 is used to eliminate wrong or invalid data, and then sort according to the submission time from small to large; the users and topics involved in the statistical data form a user set and a topic set.

本发明实施例通过上述模块实现了利用题目具有“难度”的特点来改进传统推荐方法，获得了较好的效果。本发明实施例将来可以应用到学习网站中，帮助用户选择学习内容，制定个性化的学习方案。The embodiment of the present invention uses the above-mentioned modules to improve the traditional recommendation method by taking advantage of the "difficulty" feature of the topic, and obtains better results. The embodiment of the present invention can be applied to learning websites in the future to help users select learning content and formulate personalized learning programs.

本发明实施例对各器件的型号除做特殊说明的以外，其他器件的型号不做限制，只要能完成上述功能的器件均可。In the embodiments of the present invention, unless otherwise specified, the models of the devices are not limited, as long as they can complete the above functions.

本领域技术人员可以理解附图只是一个优选实施例的示意图，上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。Those skilled in the art can understand that the accompanying drawing is only a schematic diagram of a preferred embodiment, and the serial numbers of the above-mentioned embodiments of the present invention are for description only, and do not represent the advantages and disadvantages of the embodiments.

以上所述仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection of the present invention. within range.

Claims

Translated fromChinese

对目标用户，选择若干评分最高的目标题目作为目标用户的推荐结果；For the target user, select several target topics with the highest scores as the recommendation results for the target user;

其中，所述目标用户特征向量具体为：目标用户在每类题目上的典型度；Wherein, the target user feature vector is specifically: the typical degree of the target user on each type of topic;

其中，type_i代表题目类型；typicality_i是目标用户在type_i类型题目上的典型度；i为题目类型编号；n为题目类型总数；Among them, type_i represents the type of topic; typicality_i is the typical degree of the target user on type_i type of topic; i is the number of the topic type; n is the total number of topic types;

上述典型度的具体计算如下：The specific calculation of the above typical degree is as follows:

2.根据权利要求1所述的一种基于典型度和难度的题目推荐方法，其特征在于，所述题目推荐方法还包括：2. a kind of topic recommendation method based on typical degree and difficulty according to claim 1, is characterized in that, described topic recommendation method also comprises:

3.一种用于权利要求1-2中任一权利要求所述的基于典型度和难度的题目推荐方法的推荐装置，其特征在于，所述推荐装置包括：3. A recommending device for the method for recommending topics based on typicality and difficulty according to any one of claims 1-2, wherein the recommending device comprises: