CN104462448A

Movatterモバイル変換

Info

Publication number: CN104462448A
Application number: CN201410779559.9A
Authority: CN
Inventors: 李金奎
Original assignee: Weibo Internet Technology China Co Ltd
Current assignee: Weibo Internet Technology China Co Ltd
Priority date: 2014-12-15
Filing date: 2014-12-15
Publication date: 2015-03-25
Anticipated expiration: 2034-12-15
Also published as: CN104462448B

Abstract

Translated fromChinese

本申请公开了一种分组名分类方法，用于解决根据现有技术中的分组方式可能导致向用户推荐的信息不准确的问题。主要包括：获取待分类分组名；根据待分类分组名表示的分组所包含的关注对象的特征，确定待分类分组名的特征值；根据待分类分组名的特征值，对待分类分组名进行分类。还公开了一种分组名分类装置。

The present application discloses a method for classifying group names, which is used to solve the problem that the information recommended to users may be inaccurate according to the grouping method in the prior art. It mainly includes: obtaining the group name to be classified; determining the feature value of the group name to be classified according to the characteristics of the objects of interest contained in the group name represented by the group name to be classified; and classifying the group name to be classified according to the feature value of the group name to be classified. Also disclosed is a device for classifying group names.

Description

Translated fromChinese

一种分组名分类方法及装置Method and device for classifying group names

技术领域technical field

本申请涉及计算机技术领域，尤其涉及一种分组名分类方法及装置。The present application relates to the field of computer technology, in particular to a group name classification method and device.

背景技术Background technique

分组名，一般是指：社交网络中的用户按照自己与关注对象的关系，或按照自己对关注对象产生兴趣的原因，将关注对象划分到不同分组后，为分组取的名称。由于分组名往往可以体现用户个人的兴趣爱好和社交关系，因此分组名一般可以分为两个类别，分别为“关系类别”和“兴趣类别”，这两个类别中的分组名分别为与用户的社交关系有关的分组名，以及与用户的兴趣爱好有关的分组名。The group name generally refers to the name given to the group after the users in the social network divide the follow objects into different groups according to their relationship with the follow objects, or according to the reasons for their interest in the follow objects. Since the group name can often reflect the user's personal hobbies and social relations, the group name can generally be divided into two categories, namely "relationship category" and "interest category". Group names related to the social relationship of the user, and group names related to the user's hobbies.

现有技术中，为了向用户推荐与用户的兴趣爱好有关的信息或与用户的社交关系有关的信息，首先要按分组名语义，确定用户在社交网络上设置的分组名所属类别，进而根据该类别，向用户推荐相关信息。该方式存在的缺陷在于：例如针对分组名为“同事”的分组而言，若该分组中的关注对象都是影视明星，那么如果仅凭语义解析，可能会将“同事”这个分组名直接划分到关系类别中，从而导致根据该类别向用户推荐的信息不准确。In the prior art, in order to recommend information related to the user's hobbies or social relations to the user, it is first necessary to determine the category of the group name set by the user on the social network according to the semantics of the group name, and then according to the category to recommend relevant information to users. The disadvantage of this method is: for example, for a group named "colleagues", if the concerned objects in this group are all movie stars, then if only semantic analysis is used, the group name "colleagues" may be directly divided into the relationship category, resulting in inaccurate information recommended to the user based on that category.

发明内容Contents of the invention

本申请实施例提供一种分组名分类方法，用以解决根据现有技术中的分组方式可能导致向用户推荐的信息不准确的问题。The embodiment of the present application provides a group name classification method to solve the problem that the information recommended to the user may be inaccurate according to the grouping method in the prior art.

本申请实施例还提供一种分组名分类装置，用以解决根据现有技术中的分组方式可能导致向用户推荐的信息不准确的问题。The embodiment of the present application also provides an apparatus for classifying group names, which is used to solve the problem that the information recommended to the user may be inaccurate according to the grouping method in the prior art.

本申请实施例采用下述技术方案：The embodiment of the application adopts the following technical solutions:

一种分组名分类方法，主要包括：A group name classification method mainly includes:

获取待分类分组名；Get the name of the group to be classified;

根据所述待分类分组名表示的分组所包含的关注对象的特征，确定所述待分类分组名的特征值；Determine the feature value of the group name to be classified according to the characteristics of the objects of interest contained in the group represented by the group name to be classified;

根据所述待分类分组名的特征值，对所述待分类分组名进行分类。Classify the group names to be classified according to the feature values of the group names to be classified.

一种分组名分类装置，包括：A group name classification device, comprising:

获取单元，用于获取待分类分组名；The obtaining unit is used to obtain the name of the group to be classified;

确定单元，用于根据所述待分类分组名表示的分组所包含的关注对象的特征，确定所述待分类分组名的特征值；A determining unit, configured to determine the feature value of the name of the group to be classified according to the characteristics of the object of interest contained in the group represented by the name of the group to be classified;

分类单元，用于根据所述待分类分组名的特征值，对所述待分类分组名进行分类。A classification unit, configured to classify the name of the group to be classified according to the feature value of the name of the group to be classified.

本申请实施例采用的上述至少一个技术方案能够达到以下有益效果：The above at least one technical solution adopted in the embodiment of the present application can achieve the following beneficial effects:

由于是根据待分类分组名表示的分组所包含的关注对象的特征，确定待分类分组名的特征值，并该特征值对待分类分组名进行分类，从而使得分类结果与分组所包含的关注对象的特征相匹配，解决了现有技术中凭语义确定待分类分组名所属类别，会导致向用户推荐的信息不准确的问题。Owing to being according to the feature of the object of interest contained in the grouping that the grouping name to be classified represents, determine the characteristic value of the grouping name to be classified, and this characteristic value classifies the grouping name to be classified, thereby making classification result and the object of interest contained in the grouping The features are matched, which solves the problem in the prior art that the category of the group name to be classified is determined by semantics, which will lead to inaccurate information recommended to the user.

附图说明Description of drawings

此处所说明的附图用来提供对本申请的进一步理解，构成本申请的一部分，本申请的示意性实施例及其说明用于解释本申请，并不构成对本申请的不当限定。在附图中：The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The schematic embodiments and descriptions of the application are used to explain the application and do not constitute an improper limitation to the application. In the attached picture:

图1为本申请实施例提供的一种分组名分类方法的流程图；Fig. 1 is the flow chart of a kind of group name classification method provided by the embodiment of the present application;

图2为本申请实施例提供的利用决策树模型对待分类分组名进行分类的方法流程图；Fig. 2 is the flow chart of the method for classifying the name of the group to be classified by using the decision tree model provided by the embodiment of the present application;

图3为本申请实施例提供的一种分组名分类装置的结构框图。Fig. 3 is a structural block diagram of an apparatus for classifying group names provided by an embodiment of the present application.

具体实施方式Detailed ways

为使本申请的目的、技术方案和优点更加清楚，下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然，所描述的实施例仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the purpose, technical solution and advantages of the present application clearer, the technical solution of the present application will be clearly and completely described below in conjunction with specific embodiments of the present application and corresponding drawings. Apparently, the described embodiments are only some of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

以下结合附图，详细说明本申请各实施例提供的技术方案。The technical solutions provided by various embodiments of the present application will be described in detail below in conjunction with the accompanying drawings.

实施例1Example 1

为了解决根据现有技术中的分组方式可能导致向用户推荐的信息不准确的问题，本申请提出一种分组名分类方法，该方法的实现流程图如图1所示，主要包括下述步骤：In order to solve the problem that the information recommended to users may be inaccurate according to the grouping method in the prior art, this application proposes a method for classifying group names. The implementation flow chart of this method is shown in Figure 1, which mainly includes the following steps:

步骤11、获取待分类分组名；Step 11, obtaining the name of the group to be classified;

步骤12、根据待分类分组名表示的分组所包含的关注对象的特征，确定待分类分组名的特征值；Step 12, according to the characteristics of the object of interest contained in the group represented by the group name to be classified, determine the feature value of the group name to be classified;

步骤13、根据待分类分组名的特征值，对待分类分组名进行分类。Step 13: Classify the group names to be classified according to the feature values of the group names to be classified.

采用本申请实施例提供的上述方法，由于是根据获取到待分类分组名表示的分组所包含的关注对象的特征，确定出待分类分组名的特征值；进而根据该分组名的特征值对待分类分组名进行分类，从而使得分类结果与分组所包含的关注对象的特征相匹配，解决了现有技术中凭语义确定待分类分组名所属类别，会导致向用户推荐的信息不准确的问题。Using the above-mentioned method provided by the embodiment of the present application, the feature value of the group name to be classified is determined according to the feature of the object of interest contained in the group represented by the group name to be classified; and then classified according to the feature value of the group name Classify the group names, so that the classification results match the characteristics of the objects of interest contained in the group, and solve the problem in the prior art that the category of the group name to be classified is determined by semantics, which will lead to inaccurate information recommended to users.

以下对本申请实施例一些可选的实施方式进行详细说明。Some optional implementation manners of the embodiments of the present application are described in detail below.

在一种实施方式中，为了避免受不规范字符的干扰，在步骤12之前，可以将获取的待分类分组名进行预处理。In one embodiment, in order to avoid being disturbed by irregular characters, before step 12, the acquired group names to be classified may be preprocessed.

具体而言，可以获取待分类分组名的被用户所使用的频次；然后，提取被用户所使用频次大于预设频次的待分类分组名；最后，将提取的待分类分组名进行格式归一化。其中，格式归一化，可以是将繁体字转化为简体字、去除标点符号和/或大写字符转换为小写字符等。Specifically, the user-used frequency of the group name to be classified can be obtained; then, extract the group name to be classified whose frequency of use by the user is greater than the preset frequency; finally, normalize the format of the extracted group name to be classified . Wherein, format normalization may be converting traditional characters into simplified characters, removing punctuation marks and/or converting uppercase characters into lowercase characters, etc.

在一种实施方式中，可以通过下述步骤A1-步骤A2实现步骤12：In one embodiment, step 12 can be realized through the following steps A1-step A2:

步骤A1、确定具备待分类分组名的至少一个分组对应的关联程度值作为关注对象的特征；Step A1, determining the degree of association value corresponding to at least one group with the name of the group to be classified as a feature of the object of interest;

步骤A2、根据关联程度值，确定出待分类分组名的特征值。Step A2. Determine the feature value of the name of the group to be classified according to the value of the degree of association.

其中，每一个分组对应的关联程度值表示当前分组所包含的关注对象与设置该分组的用户之间的关联程度。Wherein, the degree of association value corresponding to each group indicates the degree of association between the object of interest contained in the current group and the user who sets the group.

本申请实施例中，步骤A2可以但不限于有以下两种实现方式：In the embodiment of this application, step A2 can be implemented but not limited to the following two ways:

第一种实施方式，步骤A2可以通过公式[1]所示的数学表达式来说明：In the first implementation mode, step A2 can be described by the mathematical expression shown in formula [1]:

${AVG AVG}_{f f} = = \frac{{Σ Σ}_{i i = = 11}^{N N} {Count count}_{i i}}{N N} - - - - - - [[11]]$

其中，AVG_f表示待分类分组名的特征值；N表示设置该待分类分组名的用户的总数量；i表示用户编号，且i∈[1，N]；Count_i表示编号为i的用户与该用户所设置的具备该待分类分组名的分组包含的关注对象的关联程度值。Among them, AVG_f represents the feature value of the group name to be classified; N represents the total number of users who set the group name to be classified; i represents the user number, and i∈[1, N]; Count_i represents the number of users with the number i and The association degree value of the objects of interest included in the group with the group name to be classified set by the user.

针对公式[1]进一步举例：假设待分类分组名为“初中好友”，设置该分组名的用户有U1和U2，其中，被U1划分到“初中好友”分组中的关注对象有100人，这100人中有50人也同时对U1进行了关注；U2在“初中好友”分组所包含的关注对象有30人，这30人中有20人也同时对U2进行了关注。则由上述公式[1]可以确定：For formula [1], take a further example: suppose the name of the group to be classified is "junior high school friends", and the users who set the group name are U1 and U2. Among them, there are 100 people who are followed by U1 into the group "junior high school friends". 50 out of 100 people also follow U1 at the same time; U2 has 30 followers in the "junior high school friends" group, and 20 of these 30 people also follow U2 at the same time. Then it can be determined by the above formula [1]:

N＝2，Count₁＝50，Count₂＝20；进而可以由公式[1]得到待分类分组名的特征值AVG_f＝35。N=2, Count₁ =50, Count₂ =20; furthermore, the feature value AVG_f =35 of the group name to be classified can be obtained from the formula [1].

第二种实施方式：步骤A2可以包括：通过根据大于第一阈值的关联程度值对应的具备待分类分组名的第一分组数目，以及小于第二阈值的关联程度值对应的具备待分类分组名的第二分组数目，确定出待分类分组名的特征值。其中，第一阈值大于第二阈值。Second implementation mode: step A2 may include: according to the number of the first group with the name of the group to be classified corresponding to the value of the degree of association greater than the first threshold, and the number of groups with the name of the group to be classified corresponding to the value of the degree of association less than the second threshold The second number of groups is determined to determine the feature value of the name of the group to be classified. Wherein, the first threshold is greater than the second threshold.

具体而言，该实施方式可以由公式[2]所示的数学表达式来说明：Specifically, this embodiment can be described by the mathematical expression shown in formula [2]:

$Sub Sub = = \frac{High High - - Low Low}{High High} \times \times 100100 % % - - - - - - [[22]]$

其中，Sub表示待分类分组名的特征值；High表示大于第一阈值的关联程度值对应的具备待分类分组名的第一分组数目；Low表示小于第二阈值的关联程度值对应的具备待分类分组名的第二分组数目。Wherein, Sub represents the eigenvalue of the name of the group to be classified; High represents the first group number corresponding to the degree of association greater than the first threshold with the group name to be classified; Low represents the number of groups to be classified corresponding to the degree of association less than the second threshold; The second group number of the group name.

针对公式[2]进一步下述举例假设：For the formula [2], the following example assumptions are further given:

待分类分组名为“初三六班”，“机器学习”和“影视艺人”；The groups to be classified are named "Junior Class Three and Six", "Machine Learning" and "Film and Television Artists";

具备待分类组名的分组总数目如下表一所示：The total number of groups with group names to be classified is shown in Table 1 below:

表一：Table I:

待分类分组名Group name to be classified具备待分类分组名的分组总数目The total number of groups with group names to be sorted初三六班Class 6, Grade 3110110机器学习machine learning837837影视艺人film and television artist204204

进一步地，假设关联程度值为互粉率；此外，具备“初三六班”这一待分类分组名的110个分组中，互粉率分布如表二所示：Further, it is assumed that the degree of association is the mutual fan rate; in addition, among the 110 groups with the name of the group to be classified as "Junior Class 3 and 6", the distribution of mutual fan rates is shown in Table 2:

表二：Table II:

类似地，也可以得到与表二所示的互粉率分布类似的待分类分组名为“机器学习”和“影视艺人”的互粉率分布，在此本申请不一一赘述。Similarly, the mutual fan rate distribution of the groups to be classified named "machine learning" and "film and television artist" similar to the mutual fan rate distribution shown in Table 2 can also be obtained, and this application will not repeat them here.

进一步地，若假设第一阈值为60％，第二阈值为30％，则通过对表二右列中数据的统计，可以得到：待分类分组名“初三六班”对应的High＝99，Low＝0，进而可以根据公式[2]，计算出Sub＝100％。类似地，可以得到其他待分类分组名“机器学习”和“影视艺人”的Sub值。Further, if it is assumed that the first threshold is 60%, and the second threshold is 30%, then through the statistics of the data in the right column of Table 2, it can be obtained: High=99 corresponding to the name of the group to be classified "Junior Class 6", Low=0, and Sub=100% can be calculated according to the formula [2]. Similarly, the Sub values of other to-be-classified group names "machine learning" and "film and television artist" can be obtained.

基于确定出的Sub值，步骤13的一种实施方式可以包括：Based on the determined Sub value, an implementation manner of step 13 may include:

根据Sub值，以及设定的Sub阈值，确定待分类分组名所属类别。According to the Sub value and the set Sub threshold, determine the category to which the group name to be classified belongs.

在一种实施方式中，可以根据预先采集的样本分组名，设置该Sub阈值。比如假设：以分组名为“初三六班”、“亲亲朋友”、“兄弟姐妹”和“机器学习”、“影视相关”、“影视艺人”所构成的样本分组集合为例，若该样本分组集合中的“初三六班”、“亲亲朋友”、“兄弟姐妹”属于关系类别，并且得到“初三六班”、“亲亲朋友”、“兄弟姐妹”的分组总数在分组集合中的占比为39.67％；该样本分组集合中的“机器学习”、“影视相关”、“影视艺人”属于兴趣类别，并且得到“机器学习”、“影视相关”、“影视艺人”的分组总数在分组集合中的占比为60.33％；那么，通过该样本分组名集合的训练，可以确定出当该Sub阈值＝20％时，能够达到最好的分类效果。In an implementation manner, the Sub threshold may be set according to a pre-collected sample group name. For example, suppose: Take the sample grouping set composed of group names "Junior Class 3 and 6", "relatives and friends", "brothers and sisters", "machine learning", "film and television related", "film and television artist" as an example, if the In the sample grouping set, "junior three and six classes", "relatives and friends", and "brothers and sisters" belong to the relationship category, and the total number of groups of "junior three and six classes", "relatives and friends", and "siblings and sisters" in the grouping The proportion in the collection is 39.67%; "machine learning", "film and television related", and "film and television artist" in the sample grouping collection belong to the interest category, and get "machine learning", "film and television related", "film and television artist" The total number of groups accounts for 60.33% of the group set; then, through the training of the sample group name set, it can be determined that when the Sub threshold=20%, the best classification effect can be achieved.

上述假设内容具体请见下表三：Please refer to Table 3 below for details of the above assumptions:

表三：Table three:

基于上述表三，在确定出待分类分组名的Sub值后，若该Sub>20％，则确定待分类分组名为关系分组，若该Sub≤20％，则确定待分类分组名为兴趣分组。Based on the above Table 3, after determining the Sub value of the group name to be classified, if the Sub>20%, then determine that the group name to be classified is a relational group, if the Sub≤20%, then determine that the group name to be classified is an interest group .

在一种实施方式中，可以通过下述步骤B1-步骤B3实现步骤12：In one embodiment, step 12 can be realized through the following steps B1-B3:

步骤B1、确定使用待分类分组名的用户的数量；Step B1, determining the number of users using the group name to be classified;

步骤B2、获取被用户划分到该分组的关注对象的数量作为待分类分组名的特征；Step B2, obtaining the number of objects of interest classified into the group by the user as the feature of the name of the group to be classified;

步骤B3、根据用户的数量和关注对象的数量，确定出待分类分组名的特征值。Step B3, according to the number of users and the number of objects of interest, determine the feature value of the name of the group to be classified.

本申请实施例中，步骤B3可以但不限于有以下两种实现方式：In the embodiment of the present application, step B3 can be implemented but not limited to the following two ways:

第一种实现方式：步骤B3可以通过公式[3]所示的数学表达式来说明：The first way of implementation: step B3 can be described by the mathematical expression shown in formula [3]:

${AVG AVG}_{u u} = = \frac{{Σ Σ}_{j j = = 11}^{N N} {Count count}_{j j}}{N N} - - - - - - [[33]]$

其中，AVG_u表示待分类分组名的特征值；N表示设置该待分类分组名的用户的总数量；j表示用户编号，且j∈[1，N]；Count_j表示被编号为j的用户划分到待分类分组名所表示的分组中的关注对象的数量。Among them, AVG_u represents the feature value of the group name to be classified; N represents the total number of users who set the group name to be classified; j represents the user number, and j ∈ [1, N]; Count_j represents the user numbered j The number of objects of interest divided into the group indicated by the name of the group to be classified.

针对公式[3]进一步举例：假设待分类分组名为“名人明星”，设置该分组名的用户有U1、U2、U3，被U1划分到“名人明星”分组中的关注对象有100人、被U2划分到“名人明星”分组中的关注对象有50和被U3划分到“名人明星”分组中的关注对象有30。则由上述公式[3]可以确定：For formula [3], take a further example: suppose the name of the group to be classified is "celebrity star", the users who set the group name are U1, U2, U3, and there are 100 people who are followed by U1 into the "celebrity star" group. There are 50 objects of interest classified by U2 into the group of "celebrities" and 30 objects of interest classified by U3 into the group of "celebrities". Then it can be determined by the above formula [3]:

N＝3，Count₁＝100，Count₂＝50，Count₃＝30，进而可以由公式[3]得到待分类分组名的特征值AVG_u＝60。N=3, Count₁ =100, Count₂ =50, Count₃ =30, and then the feature value AVG_u =60 of the group name to be classified can be obtained from the formula [3].

第二种实施方式：步骤B3可以包括：通过获取被用户划分到该分组的、具备特定标识的关注对象的数量作为待分类分组特征。Second implementation manner: Step B3 may include: acquiring the number of objects of interest classified into the group by the user and having a specific identifier as the feature of the group to be classified.

具体而言，若用户U1和U2是带有特定标识的用户，则可得到：Specifically, if users U1 and U2 are users with specific identifiers, then:

N＝2，Count₁＝100，Count₂＝50，进而由公式[3]可以得到AVG_u＝50。N=2, Count₁ =100, Count₂ =50, and then AVG_u =50 can be obtained from formula [3].

基于确定出的AVG_u值，步骤13的一种实施方式可以包括：Based on the determined AVG_u value, an implementation of step 13 may include:

根据AVG_u值，以及设定的AVG_u阈值，确定待分类分组名所属类别。According to the AVG_u value and the set AVG_u threshold, determine the category to which the group name to be classified belongs.

在一种实施方式中，可以根据预先采集的样本分组名，设置该AVG_u阈值。比如假设：以分组名为“资讯围脖”、“汽车4s店”、“法院同行”、“爸爸去哪了”、“黄金投资”和“大学同学”构成的样本分组集合为例，若该样本分组集合中的“资讯围脖”、“汽车4s店”、“法院同行”属于兴趣类别，并且得到“资讯围脖”、“汽车4s店”、“法院同行”的分组总数在分组集合中占比为38.25％，该样本分组集合中“爸爸去哪了”、“黄金投资”和“大学同学”属于关系类别，并且得到“爸爸去哪了”、“黄金投资”和“大学同学”的分组总数在分组集合中的占比为61.75％，那么，通过该样本分组名集合的训练，可以确定出该AVG_u阈值＝1。In an implementation manner, the AVG_u threshold may be set according to a pre-collected sample group name. For example, suppose: Take the sample grouping set composed of group names "information scarf", "car 4S shop", "court colleagues", "where is dad", "gold investment" and "college classmates" as an example, if the sample "Information scarf", "automobile 4s store", and "court counterparts" in the grouping set belong to the interest category, and the total number of groups of "information bib", "automobile 4s store", and "court counterparty" in the grouping set is as follows: 38.25%, "Where is Dad", "Golden Investment" and "College Classmates" in the sample grouping set belong to the relationship category, and the total number of groups of "Where is Dad", "Golden Investment" and "College Classmates" is in The proportion in the group set is 61.75%, then, through the training of the sample group name set, it can be determined that the AVG_u threshold=1.

上述假设内容具体请见下表四：Please refer to Table 4 below for details of the above assumptions:

表四：Table four:

基于上述表四，当AVG_u>1时，确定待分类分组名为兴趣类别即占比为38.25％，当AVG_u≤1时，确定出待分类分组名为关系类别即占比为61.75％，但是由于公式[3]并没有考虑到用户与关注对象之间关联程度，所以公式[3]对获取待分类分组名的细粒度不高，因此，会利用公式[3]所得到的特征值与公式[2]所得到的特征值结合使用的方式来对待分类分组名的特征值进行分类。Based on the above Table 4, when AVG_u > 1, it is determined that the group to be classified is called the interest category, which accounts for 38.25%. When AVG_u ≤ 1, it is determined that the group to be classified is named the relationship category, which accounts for 61.75%. However, since the formula [3] does not take into account the degree of association between the user and the object of interest, the fine-grainedness of the formula [3] for obtaining the name of the group to be classified is not high. Therefore, the eigenvalue obtained by the formula [3] and The eigenvalues obtained by the formula [2] are used in combination to classify the eigenvalues of the group names to be classified.

以上介绍的是确定待分类分组名的特征值的几种方式，以下介绍一种根据确定的特征值进行分类的方法：The above are several ways to determine the characteristic value of the group name to be classified. The following introduces a method of classifying according to the determined characteristic value:

首先，设置利用公式[2]的方式分别得到待分类分组名“同事”、“名人明星”的特征值10％，60％，分别作为“同事”、“名人明星”的第一特征值；设置利用公式[3]的方式分别得到待分类分组名的特征值10、4，分别作为“同事”、“名人明星”的第二特征值；设置利用公式[1]的方式分别得到待分类分组名的特征值100、70，分别作为“同事”、“名人明星”的第三特征值。First, use the formula [2] to obtain 10% and 60% of the eigenvalues of the group names "colleagues" and "celebrities" to be classified, respectively, as the first eigenvalues of "colleagues" and "celebrities"; set Use the formula [3] to obtain the eigenvalues 10 and 4 of the group names to be classified respectively, and use them as the second eigenvalues of "colleagues" and "celebrities" respectively; set and use the formula [1] to obtain the group names to be classified respectively The eigenvalues of 100 and 70 are respectively used as the third eigenvalues of "colleagues" and "celebrities".

然后，执行如下操作：Then, do something like this:

利用如图2所示的决策树模型，对待分类分组名进行分类。具体过程为：Use the decision tree model shown in Figure 2 to classify the group names to be classified. The specific process is:

判断“同事”的第一特征值是否大于20％；在得到“否”的判断结果后，判断第二特征值是否大于20；在得到“同事的第二特征值不大于20”的判断结果后，判断出“同事”属于关系类别。Judging whether the first eigenvalue of "colleague" is greater than 20%; after obtaining the judgment result of "no", judge whether the second eigenvalue is greater than 20; after obtaining the judgment result of "the second eigenvalue of colleague is not greater than 20" , it is judged that "colleague" belongs to the relationship category.

判断“名人明星”的第一特征值是否大于20％；在得到“是”的判断结果后，判断第二特征值是否处于[0,5)；在得到“第二特征值处于[0,5)”的判断结果后，判断第三特征值是否处于[0,90)，在得到“名人明星的第三特征值处于[0,90)”的判断结果后，判断出“名人明星”属于兴趣类别。Judging whether the first eigenvalue of "celebrity star" is greater than 20%; after getting the judgment result of "yes", judging whether the second eigenvalue is in [0,5); after getting "the second eigenvalue is in [0,5 )", judge whether the third eigenvalue is in [0,90), and after getting the judgment result of "the third eigenvalue of celebrity star is in [0,90)", it is judged that "celebrity star" belongs to the interest category.

在一种实施方式中，可以在确定分组名所属类别之后，再对已确定出类别的分组名进行规划化处理。具体规范化方式可以为：通过词性过滤的方式，将确定出类别的分组名划分为两部分，分别为规划化分组名以及待修改的分组名。In an implementation manner, after the category to which the group name belongs is determined, the group name whose category has been determined can be planned. A specific normalization method may be as follows: by means of part-of-speech filtering, the group name of the determined category is divided into two parts, namely, the planned group name and the group name to be modified.

具体而言，由于兴趣类别中的分组名，通常是由比较常见的名词、动词、形容词等组成，因此可以采用白名单机制；而关系类别的分组名，通常是词性本身就很复杂多变，因此我们采用黑名单机制。过滤规则可如下表三所示：Specifically, since the group names in the interest category are usually composed of relatively common nouns, verbs, adjectives, etc., a whitelist mechanism can be used; while the group names in the relationship category are usually complex and changeable, So we use a blacklist mechanism. Filtering rules can be shown in Table 3 below:

在完成分组名的规划化处理后，针对各个待修改的分组名，可以分别执行：从规划化分组名中，确定对应的关注对象的特征与该待修改的分组名对应的关注对象的特征相同的规划化分组名，进而将该待修改的分组名修改为确定出的该规划化分组名。After completing the planning process of the group name, for each group name to be modified, it can be executed separately: From the planned group name, determine that the characteristics of the corresponding object of interest are the same as the characteristics of the object of interest corresponding to the group name to be modified The planned group name, and then modify the group name to be modified to the determined planned group name.

需要说明的是，实施例1所提供方法的各步骤的执行主体均可以是同一设备，或者，该方法也由不同设备作为执行主体。比如，步骤11和步骤12的执行主体可以为设备1，步骤13的执行主体可以为设备2；又比如，步骤11的执行主体可以为设备1，步骤12和步骤13的执行主体可以为设备2；等等。It should be noted that the subject of execution of each step of the method provided in Embodiment 1 may be the same device, or the method may also be executed by different devices. For example, the execution subject of step 11 and step 12 can be device 1, and the execution subject of step 13 can be device 2; for another example, the execution subject of step 11 can be device 1, and the execution subject of step 12 and step 13 can be device 2 ;etc.

实施例2Example 2

为了解决根据现有技术中的分组方式可能导致向用户推荐的信息不准确的问题，本申请提出一种分组名分类装置，该方法的实现流程图如图3所示，主要包括：获取单元31、确定单元32和分类单元33，具体如下：In order to solve the problem that the information recommended to the user may be inaccurate according to the grouping method in the prior art, this application proposes a device for classifying group names. The implementation flow chart of this method is shown in Figure 3, which mainly includes: an acquisition unit 31 , determination unit 32 and classification unit 33, specifically as follows:

获取单元31，用于获取待分类分组名；Acquisition unit 31, used to obtain the name of the group to be classified;

确定单元32，用于根据待分类分组名表示的分组所包含的关注对象的特征，确定所述待分类分组名的特征值；A determining unit 32, configured to determine the feature value of the group name to be classified according to the characteristics of the object of interest contained in the group represented by the group name to be classified;

分类单元33，用于根据待分类分组名的特征值，对待分类分组名进行分类。The classification unit 33 is configured to classify the group names to be classified according to the feature values of the group names to be classified.

在一种实施方式中，确定单元32，可以用于确定具备待分类分组名的至少一个分组对应的关联程度值作为所述特征；其中，每一个分组对应的关联程度值表示当前分组所包含的关注对象与设置该分组的用户之间的关联程度；根据关联程度值，确定出待分类分组名的特征值。In one embodiment, the determination unit 32 can be used to determine the corresponding degree of association value of at least one group with the name of the group to be classified as the feature; wherein, the degree of association value corresponding to each group indicates that the current group contains The degree of association between the concerned object and the user who sets the group; according to the value of the degree of association, the feature value of the name of the group to be classified is determined.

在一种实施方式，确定单元32，可以用于根据大于第一阈值的关联程度值对应的第一分组数目，以及小于第二阈值的关联程度值对应的第二分组数目，确定出待分类分组名的特征值；其中，第一阈值大于第二阈值。In one embodiment, the determination unit 32 may be configured to determine the group to be classified according to the first group number corresponding to the association degree value greater than the first threshold value and the second group number corresponding to the association degree value smaller than the second threshold value The characteristic value of the name; wherein, the first threshold is greater than the second threshold.

在一种实施方式中，确定单元32，还可以用于确定使用待分类分组名的用户的数量；获取被用户划分到分组的关注对象的数量作为该特征；根据用户的数量和关注对象的数量，确定出待分类分组名的特征值。In one embodiment, the determination unit 32 can also be used to determine the number of users using the group name to be classified; obtain the number of objects of interest divided into groups by the user as the feature; according to the number of users and the number of objects of interest , to determine the feature value of the group name to be classified.

在一种实施方式中，确定单元32，可以用于获取被用户划分到分组的、具备特定标识的关注对象的数量作为待分类分组名的特征。In one embodiment, the determining unit 32 may be configured to acquire the number of objects of interest classified into groups by the user and having specific identifiers as a feature of the name of the group to be classified.

采用上述实施例2所提供的装置，由于是根据获取到待分类分组名表示的分组所包含的关注对象的特征，确定出待分类分组名的特征值；进而根据该分组名的特征值对待分类分组名进行分类。从而使得分类结果与分组所包含的关注对象的特征相匹配，进而解决了现有技术中的分组方式可能导致向用户推荐的信息不准确的问题。Using the device provided in the above-mentioned embodiment 2, since the feature of the group name to be classified is determined according to the feature of the object of interest contained in the group represented by the group name to be classified; and then according to the feature value of the group name to be classified Classify by group name. In this way, the classification result matches the features of the objects of interest included in the group, thereby solving the problem that the grouping method in the prior art may lead to inaccurate information recommended to the user.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

在一个典型的配置中，计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-permanent storage in computer-readable media, in the form of random access memory (RAM) and/or nonvolatile memory such as read-only memory (ROM) or flash RAM. Memory is an example of computer readable media.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes Other elements not expressly listed, or elements inherent in the process, method, commodity, or apparatus are also included. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

本领域技术人员应明白，本申请的实施例可提供为方法、系统或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems or computer program products. Accordingly, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

以上所述仅为本申请的实施例而已，并不用于限制本申请。对于本领域技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等，均应包含在本申请的权利要求范围之内。The above descriptions are only examples of the present application, and are not intended to limit the present application. For those skilled in the art, various modifications and changes may occur in this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included within the scope of the claims of the present application.