技术领域technical field
本发明涉及信息推荐技术领域,特别是涉及一种基于文献内容知识图谱的多层引文推荐方法。本发明在信息推荐、信息检索、网络舆情监控等领域具有广阔的应用前景。The present invention relates to the technical field of information recommendation, in particular to a multi-layer citation recommendation method based on a document content knowledge graph. The invention has broad application prospects in the fields of information recommendation, information retrieval, network public opinion monitoring and the like.
背景技术Background technique
目前,信息推荐方法可以分为三大类,基于内容的推荐、基于协同过滤的推荐、以及混合的方法。At present, information recommendation methods can be divided into three categories, content-based recommendation, collaborative filtering-based recommendation, and hybrid methods.
在基于内容的推荐方法中,首先构建推荐对象的内容特征模型和用户兴趣模型,然后计算推荐对象与用户兴趣的相似度,最后将相似度较大的推荐对象推荐给用户。推荐对象和用户模型通常采用关键词表示特征。该方法的优点是可以根据用户的历史记录来构建用户兴趣模型,反映用户的需求和偏好。其特点是,第一,推荐性能依赖于推荐对象的特征提取方法和内容特征模型,也就是依赖于推荐对象的内容特征的准确性和完整性;第二,推荐对象和用户兴趣模型基于关键词进行表示和相似度计算,停留在字符串层面,限制用户对高层次概念的认知,难以满足用户的真正需求。In the content-based recommendation method, the content feature model and user interest model of the recommended object are first constructed, then the similarity between the recommended object and the user's interest is calculated, and finally the recommended object with greater similarity is recommended to the user. Recommended objects and user models usually use keywords to represent features. The advantage of this method is that it can build a user interest model based on the user's historical records, reflecting the user's needs and preferences. Its characteristics are, first, the recommendation performance depends on the feature extraction method and content feature model of the recommended object, that is, the accuracy and completeness of the content feature of the recommended object; second, the recommended object and user interest model are based on keywords Perform representation and similarity calculations, stay at the string level, limit users' cognition of high-level concepts, and it is difficult to meet the real needs of users.
基于协同过滤的推荐方法是基于推荐对象之间的相关性或用户之间的相关性来进行推荐。基于协同过滤的推荐方法可以分为基于用户的协同推荐、基于物品的协同推荐,以及基于模型的协同推荐。该方法的优点是可以处理结构化和非结构化的复杂对象。其特点是存在稀疏性问题和冷启动问题。稀疏性问题是指对于涉及推荐对象较少的用户,在庞大的用户集中难以发现与该用户兴趣相似的用户。冷启动问题是指当新用户或者新推荐对象第一次出现在推荐系统中,系统难以获知新用户的兴趣偏好,难以对新推荐对象进行推荐。The recommendation method based on collaborative filtering is to make recommendations based on the correlation between recommended objects or the correlation between users. Recommendation methods based on collaborative filtering can be divided into user-based collaborative recommendation, item-based collaborative recommendation, and model-based collaborative recommendation. The advantage of this approach is that it can handle both structured and unstructured complex objects. It is characterized by sparsity problem and cold start problem. The sparsity problem means that for users who involve fewer recommended objects, it is difficult to find users with similar interests to the user in a huge user set. The cold start problem means that when a new user or a new recommendation object appears in the recommendation system for the first time, it is difficult for the system to know the interest preferences of the new user, and it is difficult to recommend the new recommendation object.
引文推荐是信息推荐的重要研究内容,其目的是在海量的文献中找出当前论文需要引用的论文。现有引文推荐方法主要利用文献的引用关系来进行推荐,基于关键词来表示论文的内容和用户的兴趣。Citation recommendation is an important research content of information recommendation, and its purpose is to find the papers that the current paper needs to cite in the massive literature. Existing citation recommendation methods mainly use the citation relationship of literature to recommend, and represent the content of papers and users' interests based on keywords.
发明内容Contents of the invention
本发明的目的是为了解决上述现有技术中推荐方法受限于相似用户的数量,难以检索字符不同语义相似的文献,难以检索与论文的研究对象和研究行为具有不同语义关联关系的文献,以及现有技术中的引用论文推荐结果不能很好满足用户需求的问题,提供一种基于文献内容知识图谱的多层引文推荐方法。The purpose of the present invention is to solve the problem that the recommendation method in the above-mentioned prior art is limited by the number of similar users, it is difficult to retrieve documents with different semantic similarities in characters, it is difficult to retrieve documents with different semantic associations with the research objects and research behaviors of the paper, and In the prior art, the recommendation results of cited papers cannot well meet the needs of users. A multi-layer citation recommendation method based on the knowledge graph of document content is provided.
本发明的目的是通过下述技术方案实现的。The purpose of the present invention is achieved through the following technical solutions.
一种基于文献内容知识图谱的多层引文推荐方法,包括如下步骤:A multi-layer citation recommendation method based on the knowledge map of document content, including the following steps:
步骤1,获取查询需求Step 1, get query requirements
提取需要推荐引文的论文的标题和摘要,进行词根提取(Stemming)和词形还原(Lemmatization),去掉标点符号和停用词。停用词是指不具有实际意义的词语,主要包括助词、介词、连词等。进一步,提取关键词作为搜索引擎Lucene查询需求的检索词。Extract the title and abstract of the papers that need to recommend citations, perform stemming and lemmatization, and remove punctuation marks and stop words. Stop words refer to words without practical meaning, mainly including auxiliary words, prepositions, conjunctions, etc. Further, keywords are extracted as search terms required by the search engine Lucene.
步骤2,利用文献内容的知识图谱进行查询扩展Step 2, use the knowledge graph of the document content to perform query expansion
第一,对查询需求的检索词进行扩充,利用同义词词典和近义词词典获得检索词的同义词和近义词,扩充检索词集合;First, expand the search terms required by the query, use the synonyms dictionary and synonyms dictionary to obtain synonyms and synonyms of the search terms, and expand the search term set;
第二,根据论文的标题和摘要,识别论文的研究对象词语u和研究行为词语v;Second, according to the title and abstract of the paper, identify the research object word u and the research behavior word v of the paper;
第三,利用同义词词典和近义词词典,提取论文的研究对象词语和研究行为词语的同义词和近义词,构建检索扩展词,将其添加到检索词集合中。Thirdly, use the dictionary of synonyms and dictionary of synonyms to extract the synonyms and synonyms of the research object words and research behavior words in the thesis, construct the search expansion words, and add them to the search term set.
若论文的研究对象词语u的同义词和近义词为a1,a2,…,am(m为自然数),研究行为词语v的同义词和近义词为b1,b2,…,bn(n为自然数),则构建如下的检索扩展词,其中“+”是指两个词语的连接。例如,“u+b1”是指词语u和词语b1的连接。If the synonyms and synonyms of the research object word u are a1 , a2 ,…,am (m is a natural number), the synonyms and synonyms of the research behavior word v are b1 , b2 ,…,bn (n is natural number), construct the following search expansion words, where "+" refers to the connection of two words. For example, "u+b1 " refers to the concatenation of word u and word b1 .
u+b1,u+b2,…,u+bn,u+b1 ,u+b2 ,…,u+bn ,
a1+v,a1+b1,a1+b2,…,a1+bn,a1 +v,a1 +b1 ,a1 +b2 ,…,a1 +bn ,
a2+v,a2+b1,a2+b2,…,a2+bn,a2 +v,a2 +b1 ,a2 +b2 ,…,a2 +bn ,
…,…,
am+v,am+b1,am+b2,…,am+bn.am +v,am +b1 ,am +b2 ,…,am +bn .
第四,利用知识图谱中的上下位关系子网络,提取论文的研究对象词语u和研究行为词语v的上位概念和下位概念;Fourth, use the hyponym subnetwork in the knowledge graph to extract the hypernymy and hyponym concepts of the research object word u and the research behavior word v;
若u的上位概念为c1,c2,…,cp(p为自然数),u的下位概念为d1,d2,…,dq(q为自然数),v的上位概念为e1,e2,…,es(s为自然数),v的下位概念为f1,f2,…,ft(t为自然数),则构建如下的检索扩展词:If the superordinate concept of u is c1 , c2 ,…,cp (p is a natural number), the subordinate concept of u is d1 , d2 ,…,dq (q is a natural number), and the superordinate concept of v is e1 ,e2 ,…,es (s is a natural number), and the subordinate concept of v is f1 ,f2 ,…,ft (t is a natural number), then construct the following search expansion words:
u+ej(j=1,2,…,s),u+fj(j=1,2,…,t),u+ej (j=1,2,…,s),u+fj (j=1,2,…,t),
ai+ej(i=1,2,…,m,j=1,2,…,s),ai+fj(i=1,2,…,m,j=1,2,…,t),ai +ej (i=1,2,...,m,j=1,2,...,s),ai +fj (i=1,2,...,m,j=1,2,... ,t),
ci+v(i=1,2,…,p),di+v(i=1,2,…,q),ci +v(i=1,2,...,p), di +v(i=1,2,...,q),
ci+bj(i=1,2,…,p,j=1,2,…,n),di+bj(i=1,2,…,q,j=1,2,…,n),ci +bj (i=1,2,…,p,j=1,2,…,n), di +bj (i=1,2,…,q,j=1,2,… ,n),
ci+ej(i=1,2,…,p,j=1,2,…,s),ci+fj(i=1,2,…,p,j=1,2,…,t),ci +ej (i=1,2,…,p,j=1,2,…,s),ci +fj (i =1,2,…,p,j=1,2,… ,t),
di+ej(i=1,2,…,q,j=1,2,…,s),di+fj(i=1,2,…,q,j=1,2,…,t).di +ej (i=1,2,…,q,j=1,2,…,s),di +fj (i=1,2,…,q,j=1,2,… ,t).
第五,利用知识图谱中的部分整体关系子网络,提取论文的研究对象词语u和研究行为词语v的部分概念和整体概念。若u的整体概念为g1,g2,…,go(o为自然数),u的部分概念为h1,h2,…,hr(r为自然数),v的整体概念为k1,k2,…,kw(w为自然数),v的部分概念为l1,l2,…,lz(z为自然数),则构建如下的检索扩展词:Fifth, using the part of the overall relationship sub-network in the knowledge graph to extract the partial concepts and overall concepts of the research object word u and the research behavior word v in the paper. If the overall concept of u is g1 , g2 ,…,go (o is a natural number), the partial concept of u is h1 ,h2 ,…,hr (r is a natural number), and the overall concept of v is k1 ,k2 ,…,kw (w is a natural number), part of the concept of v is l1 ,l2 ,…,lz (z is a natural number), then construct the following search expansion words:
u+kj(j=1,2,…,w),u+lj(j=1,2,…,z),u+kj (j=1,2,…,w),u+lj (j=1,2,…,z),
ai+kj(i=1,2,…,m,j=1,2,…,w),ai+lj(i=1,2,…,m,j=1,2,…,z),ai +kj (i=1,2,…,m,j=1,2,…,w),ai +lj (i=1,2,…,m,j=1,2,… ,z),
gi+v(i=1,2,…,o),hi+v(i=1,2,…,r),gi +v(i=1,2,...,o),hi +v(i=1,2,...,r),
gi+bj(i=1,2,…,o,j=1,2,…,n),hi+bj(i=1,2,…,r,j=1,2,…,n),gi +bj (i=1,2,...,o,j=1,2,...,n),hi +bj (i=1,2,...,r,j=1,2,... ,n),
gi+kj(i=1,2,…,o,j=1,2,…,w),gi+lj(i=1,2,…,o,j=1,2,…,z),gi +kj (i=1,2,...,o,j=1,2,...,w),gi +lj (i=1,2,...,o,j=1,2,... ,z),
hi+kj(i=1,2,…,r,j=1,2,…,w),hi+lj(i=1,2,…,r,j=1,2,…,z).hi +kj (i=1,2,...,r,j=1,2,...,w),hi +lj (i=1,2,...,r,j=1,2,... ,z).
第六,利用知识图谱中的并列关系子网络,提取论文的研究对象词语u和研究行为词语v的并列概念。若u的并列概念为x1,x2,…,xk1(k1为自然数),v的并列概念为y1,y2,…,yk2(k2为自然数),则构建如下的检索扩展词。Sixth, use the parallel relationship sub-network in the knowledge graph to extract the parallel concepts of the research object word u and the research behavior word v. If the parallel concept of u is x1 , x2 ,…,xk1 (k1 is a natural number), and the parallel concept of v is y1 , y2 ,…,yk2 (k2 is a natural number), then construct the following search expansion words .
u+yj(j=1,2,…,k2),xi+v(i=1,2,…,k1).u+yj (j=1,2,…,k2), xi +v(i=1,2,…,k1).
步骤3,构建文献的倒排索引Step 3, construct the inverted index of the document
根据数据集中的文献的标题和摘要构建倒排索引,包括预处理、构建索引和存储索引。预处理包括词根提取和词形还原,去掉标点符号和停用词。构建索引包括构建词语到文档的映射词典,对词语按照字典顺序排序,合并相同词语的文档映射信息,构建文档倒排链表即文档倒排索引。Construct an inverted index based on the titles and abstracts of the documents in the dataset, including preprocessing, index construction and index storage. Preprocessing includes stemming and lemmatization, removing punctuation and stop words. Building an index includes building a mapping dictionary from words to documents, sorting words in lexicographical order, merging document mapping information of the same word, and building a document inverted list, that is, a document inverted index.
步骤4,选取候选引文集Step 4, Select Candidate Citation Sets
首先,根据扩展后的检索词集合,在数据集中检索出在标题和摘要中包括任一检索词的论文。然后,计算查询与这些论文的相似度。将相似度最高的前N(N为自然数)篇论文作为候选引文集。其中,查询与论文的相似度采用搜索引擎Lucene中的向量空间模型进行计算。查询和论文由查询向量和论文向量来表示,查询和论文的相似度为查询向量和论文向量的余弦相似度。First, according to the expanded set of search terms, papers including any search term in the title and abstract are retrieved in the dataset. Then, the similarity of the query to these papers is calculated. Take the top N (N is a natural number) papers with the highest similarity as the candidate citation set. Among them, the similarity between the query and the paper is calculated using the vector space model in the search engine Lucene. Queries and papers are represented by query vectors and paper vectors, and the similarity between queries and papers is the cosine similarity between query vectors and paper vectors.
步骤5,提取候选引文与查询的相似度特征Step 5, extract the similarity features between the candidate citation and the query
候选引文与查询的相似度特征分为如下两种特征。第一种是基于搜索引擎Lucene的候选引文与查询的相似度特征。第二种是候选引文与查询的主题分布的KL距离(Kullback-LeiblerDivergence)。首先,采用隐含狄利克雷分布模型获取查询和候选引文的主题分布。然后,计算这两个主题分布的KL距离。The similarity features of candidate citations and queries are divided into the following two features. The first one is based on the similarity feature between candidate citations and queries of search engine Lucene. The second is the KL distance (Kullback-Leibler Divergence) between the candidate citation and the topic distribution of the query. First, a latent Dirichlet distribution model is employed to obtain the topic distributions of queries and candidate citations. Then, the KL distance of these two topic distributions is calculated.
步骤6,构建引文推荐的训练数据Step 6, construct training data for citation recommendation
第一,对训练数据集中每篇训练论文,根据其标题和摘要,利用搜索引擎Lucene检索出候选引文。First, for each training paper in the training data set, use the search engine Lucene to retrieve candidate citations according to its title and abstract.
第二,对于每一篇候选引文p,构建一个训练样本。训练样本特征包括候选引文p的引用次数特征、候选引文p和根据训练论文构建的查询的相似度特征。如果训练论文引用了候选引文p,则该样本的分类标签为1,否则为0。若训练论文包含m个参考文献,则可以构建m个正样本和n-m个负样本,其中n为候选引文的篇数。Second, for each candidate citation p, construct a training sample. The training sample features include the citation count feature of the candidate citation p, the similarity feature of the candidate citation p and the query constructed according to the training paper. The classification label of this sample is 1 if the training paper cites the candidate citation p, and 0 otherwise. If the training paper contains m references, m positive samples and n-m negative samples can be constructed, where n is the number of candidate citations.
步骤7,基于梯度渐进回归树进行引文推荐Step 7: Citation recommendation based on gradient progressive regression tree
第一,采用梯度渐进回归树GBRT(GradientBoostRegressionTree)来训练分类模型,实现引文推荐。分类特征包括候选引文与查询的相似度特征、论文引用次数特征。梯度渐进回归树的输出值一般为0~1之间的实数,将GBRT的输出值作为候选引文的推荐度。推荐度越大表示该候选引文分类为“推荐”的可能性就越大。进一步,将推荐度最高的M(M为自然数)篇候选引文作为当前论文的引文推荐结果;First, the gradient progressive regression tree GBRT (GradientBoostRegressionTree) is used to train the classification model and realize the citation recommendation. Classification features include the similarity features of candidate citations and queries, and the citation count features of papers. The output value of the gradient progressive regression tree is generally a real number between 0 and 1, and the output value of GBRT is used as the recommendation degree of the candidate citation. The greater the degree of recommendation, the greater the possibility that the candidate citation is classified as "recommended". Further, the M (M is a natural number) candidate citations with the highest recommendation degree are used as the citation recommendation results of the current paper;
第二,对推荐的每一篇引文p,从其标题和摘要中识别研究对象词语x和研究行为词语y。对于当前论文,构建每一篇引文p与它的多层语义关联关系。若u和v分别为当前论文的研究对象词语和研究行为词语;Second, for each recommended citation p, identify the research object word x and research behavior word y from its title and abstract. For the current paper, construct each citation p and its multi-layer semantic association. If u and v are the research object words and research behavior words of the current paper respectively;
情形1:若x为u的整体概念,或y为v的整体概念,则引文p的研究内容包括当前论文的研究内容。若x为u的部分概念,或y为v的部分概念,则当前论文的研究内容包括引文p的研究内容;Case 1: If x is the overall concept of u, or y is the overall concept of v, then the research content of the citation p includes the research content of the current paper. If x is part of the concept of u, or y is part of the concept of v, then the research content of the current paper includes the research content of the citation p;
情形2:若x为u的上位概念,或y为v的上位概念,则引文p的研究方法可应用于解决当前论文的研究问题。若x为u的下位概念,或y为v的下位概念,则当前论文的研究方法可应用于解决引文p的研究问题;Case 2: If x is a superordinate concept of u, or y is a superordinate concept of v, then the research method of citation p can be applied to solve the research problem of the current paper. If x is a subordinate concept of u, or y is a subordinate concept of v, then the research method in the current paper can be applied to solve the research problem of citation p;
情形3:若x为u的并列概念,或y为v的并列概念,则当前论文的研究方法可借鉴引文p的研究方法。Case 3: If x is a parallel concept of u, or y is a parallel concept of v, then the research method of the current paper can refer to the research method of the citation p.
至此,就完成了本方法的全部过程。So far, the entire process of the method has been completed.
有益效果Beneficial effect
本发明方法,针对现有引文推荐方法难以检索字符不同语义相似的文献、难以检索与论文的研究对象和研究行为具有不同语义关联关系的文献、受限于相似用户数量等问题,引入不同文献的内容语义关联的知识,采用一种基于文献内容知识图谱的多层引文推荐方法。该方法利用文献内容中研究对象词语和研究行为词语的各种语义关系来获取检索扩展词,基于梯度渐近回归树来进行多层次的引文推荐,提高了用户获取引文的效率。具体体现在如下方面:The method of the present invention aims at the problems that the existing citation recommendation methods are difficult to retrieve documents with different semantic similarities, difficult to retrieve documents with different semantic associations with the research objects and research behaviors of the paper, and limited by the number of similar users. The knowledge of content semantic association adopts a multi-layer citation recommendation method based on the knowledge graph of document content. This method uses various semantic relationships between the research object words and the research behavior words in the literature content to obtain the retrieval expansion words, and performs multi-level citation recommendation based on the gradient asymptotic regression tree, which improves the efficiency of users to obtain citations. Specifically reflected in the following aspects:
(1)本发明一方面通过提取论文的标题和摘要的关键词来表示论文的研究内容,另一方面通过提取论文的研究对象词语和研究行为词语来表示论文的研究内容,对论文的研究问题和研究内容进行了语义表征,更加准确地表达了论文的研究主题和内容,从而提高引文推荐的效果。(1) the present invention represents the research content of the paper by extracting the title of the paper and the keywords of the abstract on the one hand, represents the research content of the paper by extracting the research object words and the research behavior words of the paper on the other hand, to the research problem of the paper Semantic representation is carried out with the research content, and the research theme and content of the paper are more accurately expressed, thereby improving the effect of citation recommendation.
(2)利用文献内容的知识图谱来获取检索扩展词,也就是,利用论文的研究对象词语和研究行为词语的同义关系、近义关系、上下位关系、部分整体关系、并列关系来获取检索扩展词,扩大了候选引文的范围,从而解决引用文献漏检的问题和推荐系统初期的冷启动问题。(2) Use the knowledge map of the document content to obtain retrieval extension words, that is, use the synonymous relationship, near-synonymous relationship, hyponymy relationship, partial overall relationship, and parallel relationship between the research object words of the paper and the research behavior words to obtain retrieval Extended words expand the scope of candidate citations, thereby solving the problem of missing references and the initial cold start of the recommendation system.
(3)本发明采用梯度渐进回归树GBRT进行引文推荐,将引文推荐看作分类问题,每个训练样本引文的类别标签为1或0,即表示“推荐”或“不推荐”,不但保证了引文推荐结果的效果,而且保证了引文推荐方法的运行效率。(3) The present invention uses gradient progressive regression tree GBRT to carry out citation recommendation, regards citation recommendation as a classification problem, and the category label of each training sample citation is 1 or 0, which means "recommended" or "not recommended", which not only guarantees The effect of the citation recommendation results is not only guaranteed, but also the operating efficiency of the citation recommendation method is guaranteed.
(4)在文献内容的知识图谱中,可以动态添加与论文的研究对象词语和研究行为词语具有不同语义关系的词语,不断扩充文献内容的知识图谱网络,从而提高引文推荐方法的实时性和灵活性。(4) In the knowledge map of document content, words that have different semantic relations with the research object words and research behavior words of the paper can be dynamically added, and the knowledge map network of document content can be continuously expanded, thereby improving the real-time and flexibility of the citation recommendation method sex.
附图说明Description of drawings
图1为本发明方法的流程图。Fig. 1 is the flowchart of the method of the present invention.
具体实施方式detailed description
下面结合实施例对本发明方法进行详细说明。The method of the present invention will be described in detail below in conjunction with the examples.
实施例Example
一种基于文献内容知识图谱的多层引文推荐方法,包括如下步骤:A multi-layer citation recommendation method based on the knowledge map of document content, including the following steps:
步骤1,获取查询需求。Step 1, obtain query requirements.
提取需要推荐引文的论文的标题和摘要,进行词根提取(Stemming)和词形还原(Lemmatization),去掉标点符号和停用词。例如,单词“entities”通过词根提取转化为“entity”。单词“identified”通过词形还原转化为“identify”。停用词是指不具有实际意义的词语,主要包括助词、介词、连词等。例如,“is”“with”和“and”都是停用词。进一步,提取关键词作为搜索引擎Lucene查询需求的检索词。Extract the title and abstract of the papers that need to recommend citations, perform stemming and lemmatization, and remove punctuation marks and stop words. For example, the word "entities" is transformed into "entity" through stemming. The word "identified" is transformed into "identify" by lemmatization. Stop words refer to words without practical meaning, mainly including auxiliary words, prepositions, conjunctions, etc. For example, "is", "with", and "and" are all stop words. Further, keywords are extracted as search terms required by the search engine Lucene.
步骤2,利用文献内容的知识图谱进行查询扩展。Step 2, use the knowledge graph of the document content to perform query expansion.
第一,对查询需求的检索词进行扩充,利用同义词词典和近义词词典获得检索词的同义词和近义词,扩充检索扩展词集合。First, expand the search terms required by the query, use the synonyms dictionary and the synonyms dictionary to obtain synonyms and synonyms of the search terms, and expand the set of search expansion words.
例如,从标题为“一种基于隐马尔科夫模型的命名实体识别”的论文中提取关键词“隐马尔科夫模型”和“命名实体识别”作为检索词。通过同义词词典和近义词词典获得检索扩展词“HMM(隐马尔科夫模型)”和“NER(命名实体识别)”。For example, keywords "Hidden Markov Model" and "Named Entity Recognition" are extracted from a paper titled "A Hidden Markov Model-Based Named Entity Recognition" as search terms. The search expansion words "HMM (Hidden Markov Model)" and "NER (Named Entity Recognition)" are obtained through the dictionary of synonyms and the dictionary of synonyms.
第二,根据论文的标题和摘要,识别论文的研究对象词语u和研究行为词语v。例如,对于标题为“一种基于隐马尔科夫模型的命名实体识别”的论文,识别其论文的研究对象词语为“命名实体”,研究行为词语为“识别”。Second, according to the title and abstract of the paper, identify the research object word u and the research behavior word v of the paper. For example, for a paper titled "A Named Entity Recognition Based on a Hidden Markov Model", the research target word for the recognition paper is "named entity", and the research behavior word is "recognition".
第三,利用同义词词典和近义词词典,提取论文的研究对象词语和研究行为词语的同义词和近义词,构建检索扩展词,将其添加到检索词集合中。Thirdly, use the dictionary of synonyms and dictionary of synonyms to extract the synonyms and synonyms of the research object words and research behavior words in the thesis, construct the search expansion words, and add them to the search term set.
若论文的研究对象词语u的同义词和近义词为a1,a2,…,am(m为自然数),研究行为词语v的同义词和近义词为b1,b2,…,bn(n为自然数),则构建如下的检索扩展词,其中“+”是指两个词语的连接。例如,“u+b1”是指词语u和词语b1的连接。“实体+检测”是指词语“实体”和词语“检测”的连接,即“实体检测”。If the synonyms and synonyms of the research object word u are a1 , a2 ,…,am (m is a natural number), the synonyms and synonyms of the research behavior word v are b1 , b2 ,…,bn (n is natural number), construct the following search expansion words, where "+" refers to the connection of two words. For example, "u+b1 " refers to the concatenation of word u and word b1 . "Entity + detection" refers to the connection of the word "entity" and the word "detection", that is, "entity detection".
u+b1,u+b2,…,u+bn,u+b1 ,u+b2 ,…,u+bn ,
a1+v,a1+b1,a1+b2,…,a1+bn,a1 +v,a1 +b1 ,a1 +b2 ,…,a1 +bn ,
a2+v,a2+b1,a2+b2,…,a2+bn,a2 +v,a2 +b1 ,a2 +b2 ,…,a2 +bn ,
…,…,
am+v,am+b1,am+b2,…,am+bn.am +v,am +b1 ,am +b2 ,…,am +bn .
例如,对于标题为“一种基于隐马尔科夫模型的命名实体识别”的论文,提取研究行为词语“识别”的近义词为“检测”和“提取”,因此,构建检索扩展词“命名实体检测”和“命名实体提取”,并将它们添加到检索词集合中。For example, for a paper titled "A Named Entity Recognition Based on Hidden Markov Model", the synonyms of the extraction research behavior word "recognition" are "detection" and "extraction". Therefore, constructing the retrieval extension word "named entity detection " and "Named Entity Extraction" and add them to the set of terms.
第四,利用知识图谱中的上下位关系子网络,提取论文的研究对象词语u和研究行为词语v的上位概念和下位概念。Fourth, using the hyponym sub-network in the knowledge graph, the hypernymy and hyponym concepts of the research object word u and the research behavior word v are extracted.
若u的上位概念为c1,c2,…,cp(p为自然数),u的下位概念为d1,d2,…,dq(q为自然数),v的上位概念为e1,e2,…,es(s为自然数),v的下位概念为f1,f2,…,ft(t为自然数),则构建如下的检索扩展词。If the superordinate concept of u is c1 , c2 ,…,cp (p is a natural number), the subordinate concept of u is d1 , d2 ,…,dq (q is a natural number), and the superordinate concept of v is e1 ,e2 ,...,es (s is a natural number), and the subordinate concept of v is f1 ,f2 ,...,ft (t is a natural number), then construct the following search expansion words.
u+ej(j=1,2,…,s),u+fj(j=1,2,…,t),u+ej (j=1,2,…,s),u+fj (j=1,2,…,t),
ai+ej(i=1,2,…,m,j=1,2,…,s),ai+fj(i=1,2,…,m,j=1,2,…,t),ai +ej (i=1,2,...,m,j=1,2,...,s),ai +fj (i=1,2,...,m,j=1,2,... ,t),
ci+v(i=1,2,…,p),di+v(i=1,2,…,q),ci +v(i=1,2,...,p), di +v(i=1,2,...,q),
ci+bj(i=1,2,…,p,j=1,2,…,n),di+bj(i=1,2,…,q,j=1,2,…,n),ci +bj (i=1,2,…,p,j=1,2,…,n), di +bj (i=1,2,…,q,j=1,2,… ,n),
ci+ej(i=1,2,…,p,j=1,2,…,s),ci+fj(i=1,2,…,p,j=1,2,…,t),ci +ej (i=1,2,…,p,j=1,2,…,s),ci +fj (i =1,2,…,p,j=1,2,… ,t),
di+ej(i=1,2,…,q,j=1,2,…,s),di+fj(i=1,2,…,q,j=1,2,…,t).di +ej (i=1,2,…,q,j=1,2,…,s),di +fj (i=1,2,…,q,j=1,2,… ,t).
例如,对于标题为“一种基于隐马尔科夫模型的命名实体识别”的论文,提取其研究对象“命名实体”的上位概念“实体”,则可构建检索扩展词“实体识别”、“实体检测”和“实体提取”,并将它们添加到检索词集合中。For example, for a paper titled "A Named Entity Recognition Based on Hidden Markov Model", to extract the superordinate concept "entity" of its research object "named entity", you can construct the search extension words "entity recognition", "entity Detect" and "Entity Extraction" and add them to the set of terms.
第五,利用知识图谱中的部分整体关系子网络,提取论文的研究对象词语u和研究行为词语v的部分概念和整体概念。若u的整体概念为g1,g2,…,go(o为自然数),u的部分概念为h1,h2,…,hr(r为自然数),v的整体概念为k1,k2,…,kw(w为自然数),v的部分概念为l1,l2,…,lz(z为自然数),则构建如下的检索扩展词。Fifth, using the part of the overall relationship sub-network in the knowledge graph to extract the partial concepts and overall concepts of the research object word u and the research behavior word v in the paper. If the overall concept of u is g1 , g2 ,…,go (o is a natural number), the partial concept of u is h1 ,h2 ,…,hr (r is a natural number), and the overall concept of v is k1 ,k2 ,…,kw (w is a natural number), and some concepts of v are l1 ,l2 ,…,lz (z is a natural number), then construct the following search expansion words.
u+kj(j=1,2,…,w),u+lj(j=1,2,…,z),u+kj (j=1,2,…,w),u+lj (j=1,2,…,z),
ai+kj(i=1,2,…,m,j=1,2,…,w),ai+lj(i=1,2,…,m,j=1,2,…,z),ai +kj (i=1,2,…,m,j=1,2,…,w),ai +lj (i=1,2,…,m,j=1,2,… ,z),
gi+v(i=1,2,…,o),hi+v(i=1,2,…,r),gi +v(i=1,2,...,o),hi +v(i=1,2,...,r),
gi+bj(i=1,2,…,o,j=1,2,…,n),hi+bj(i=1,2,…,r,j=1,2,…,n),gi +bj (i=1,2,...,o,j=1,2,...,n),hi +bj (i=1,2,...,r,j=1,2,... ,n),
gi+kj(i=1,2,…,o,j=1,2,…,w),gi+lj(i=1,2,…,o,j=1,2,…,z),gi +kj (i=1,2,...,o,j=1,2,...,w),gi +lj (i=1,2,...,o,j=1,2,... ,z),
hi+kj(i=1,2,…,r,j=1,2,…,w),hi+lj(i=1,2,…,r,j=1,2,…,z).hi +kj (i=1,2,...,r,j=1,2,...,w),hi +lj (i=1,2,...,r,j=1,2,... ,z).
例如,对于标题为“一种基于隐马尔科夫模型的命名实体识别”的论文,提取“命名实体”的整体概念“实体信息”,则可构建检索扩展词“实体信息提取”、“实体信息识别”和“实体信息检测”,将它们添加到检索词集合中。For example, for a paper titled "A Named Entity Recognition Based on Hidden Markov Model", to extract the overall concept "entity information" of "named entity", you can construct the retrieval extension words "entity information extraction", "entity information Recognition" and "Entity Information Detection", add them to the set of search terms.
第六,利用知识图谱中的并列关系子网络,提取论文的研究对象词语u和研究行为词语v的并列概念。若u的并列概念为x1,x2,…,xk1(k1为自然数),v的并列概念为y1,y2,…,yk2(k2为自然数),则构建如下的检索扩展词。Sixth, use the parallel relationship sub-network in the knowledge graph to extract the parallel concepts of the research object word u and the research behavior word v. If the parallel concept of u is x1 , x2 ,…,xk1 (k1 is a natural number), and the parallel concept of v is y1 , y2 ,…,yk2 (k2 is a natural number), then construct the following search expansion words .
u+yj(j=1,2,…,k2),xi+v(i=1,2,…,k1).u+yj (j=1,2,…,k2), xi +v(i=1,2,…,k1).
例如,对于标题为“一种基于隐马尔科夫模型的命名实体识别”的论文,提取其研究行为词语“识别”的并列概念“链接”和“消歧”,则可构建检索扩展词“实体消歧”和“实体链接”,将它们添加到检索词集合中。For example, for a paper titled "A Named Entity Recognition Based on a Hidden Markov Model", extract the parallel concepts "link" and "disambiguation" of its research behavior word "recognition", and then construct a search extension term "entity Disambiguation" and "Entity Linking", adding them to the set of terms.
步骤3,构建文献的倒排索引。Step 3, constructing the inverted index of the document.
根据数据集中的文献的标题和摘要构建倒排索引,包括预处理、构建索引和存储索引。预处理包括词根提取和词形还原,去掉标点符号和停用词。构建索引包括构建词语到文档的映射词典,对词语按照字典顺序排序,合并相同词语的文档映射信息,构建文档倒排链表即文档倒排索引。Construct an inverted index based on the titles and abstracts of the documents in the dataset, including preprocessing, index construction and index storage. Preprocessing includes stemming and lemmatization, removing punctuation and stop words. Building an index includes building a mapping dictionary from words to documents, sorting words in lexicographical order, merging document mapping information of the same word, and building a document inverted list, that is, a document inverted index.
步骤4,选取候选引文集。Step 4, select candidate citation sets.
首先,根据扩展后的检索词集合,在数据集中检索出在标题和摘要中包括任一检索词的论文。然后,计算查询与这些论文的相似度。将相似度最高的前N(N为自然数)篇论文作为候选引文集。其中,查询与论文的相似度采用Lucene中的向量空间模型进行计算。查询和论文由查询向量和论文向量来表示,查询和论文的相似度为查询向量和论文向量的余弦相似度。First, according to the expanded set of search terms, papers including any search term in the title and abstract are retrieved in the dataset. Then, the similarity of the query to these papers is calculated. Take the top N (N is a natural number) papers with the highest similarity as the candidate citation set. Among them, the similarity between the query and the paper is calculated using the vector space model in Lucene. Queries and papers are represented by query vectors and paper vectors, and the similarity between queries and papers is the cosine similarity between query vectors and paper vectors.
步骤5,提取候选引文与查询的相似度特征。Step 5, extract the similarity features between the candidate citation and the query.
候选引文与查询的相似度特征分为如下两种特征。第一种是基于Lucene的候选引文与查询的相似度特征。第二种是候选引文与查询的主题分布的KL距离(Kullback-LeiblerDivergence)。首先,采用隐含狄利克雷分布模型获取查询和候选引文的主题分布。然后,计算这两个主题分布的KL距离。The similarity features of candidate citations and queries are divided into the following two features. The first is based on Lucene's similarity features between candidate citations and queries. The second is the KL distance (Kullback-Leibler Divergence) between the candidate citation and the topic distribution of the query. First, a latent Dirichlet distribution model is employed to obtain the topic distributions of queries and candidate citations. Then, the KL distance of these two topic distributions is calculated.
步骤6,构建引文推荐的训练数据。Step 6, construct the training data for citation recommendation.
第一,对训练数据集中每篇训练论文,根据其标题和摘要,利用搜索引擎Lucene检索出候选引文。First, for each training paper in the training data set, use the search engine Lucene to retrieve candidate citations according to its title and abstract.
第二,对于每一篇候选引文p,构建一个训练样本。训练样本特征包括候选引文p的引用次数特征、候选引文p和根据训练论文构建的查询的相似度特征。如果训练论文引用了候选引文p,则该样本的分类标签为1,否则为0。若训练论文包含m个参考文献,则可以构建m个正样本和n-m个负样本,其中n为候选引文的篇数。Second, for each candidate citation p, construct a training sample. The training sample features include the citation count feature of the candidate citation p, the similarity feature of the candidate citation p and the query constructed according to the training paper. The classification label of this sample is 1 if the training paper cites the candidate citation p, and 0 otherwise. If the training paper contains m references, m positive samples and n-m negative samples can be constructed, where n is the number of candidate citations.
步骤7,基于梯度渐进回归树进行引文推荐。Step 7: Citation recommendation based on gradient progressive regression tree.
第一,采用梯度渐进回归树GBRT(GradientBoostRegressionTree)来训练分类模型,实现引文推荐。分类特征包括候选引文与查询的相似度特征、论文引用次数特征。梯度渐进回归树的输出值一般为0~1之间的实数,将GBRT的输出值作为候选引文的推荐度。推荐度越大表示该候选引文分类为“推荐”的可能性就越大。进一步,将推荐度最高的M(M为自然数)篇候选引文作为当前论文的引文推荐结果。First, the gradient progressive regression tree GBRT (GradientBoostRegressionTree) is used to train the classification model and realize the citation recommendation. Classification features include the similarity features of candidate citations and queries, and the citation count features of papers. The output value of the gradient progressive regression tree is generally a real number between 0 and 1, and the output value of GBRT is used as the recommendation degree of the candidate citation. The greater the degree of recommendation, the greater the possibility that the candidate citation is classified as "recommended". Further, M (M is a natural number) candidate citations with the highest recommendation degree are taken as the citation recommendation results of the current paper.
第二,对推荐的每一篇引文p,从其标题和摘要中识别研究对象词语x和研究行为词语y。对于当前论文,构建每一篇引文p与它的多层语义关联关系。若u和v分别为当前论文的研究对象词语和研究行为词语;Second, for each recommended citation p, identify the research object word x and research behavior word y from its title and abstract. For the current paper, construct each citation p and its multi-layer semantic association. If u and v are the research object words and research behavior words of the current paper respectively;
情形1:若x为u的整体概念,或y为v的整体概念,则引文p的研究内容包括当前论文的研究内容。若x为u的部分概念,或y为v的部分概念,则当前论文的研究内容包括引文p的研究内容。Case 1: If x is the overall concept of u, or y is the overall concept of v, then the research content of the citation p includes the research content of the current paper. If x is a partial concept of u, or y is a partial concept of v, then the research content of the current paper includes the research content of the citation p.
情形2:若x为u的上位概念,或y为v的上位概念,则引文p的研究方法可应用于解决当前论文的研究问题。若x为u的下位概念,或y为v的下位概念,则当前论文的研究方法可应用于解决引文p的研究问题。Case 2: If x is a superordinate concept of u, or y is a superordinate concept of v, then the research method of citation p can be applied to solve the research problem of the current paper. If x is a subordinate concept of u, or y is a subordinate concept of v, then the research method in the current paper can be applied to solve the research problem of citation p.
情形3:若x为u的并列概念,或y为v的并列概念,则当前论文的研究方法可借鉴引文p的研究方法。Case 3: If x is a parallel concept of u, or y is a parallel concept of v, then the research method of the current paper can refer to the research method of the citation p.
本发明的实施过程选用物理学领域的科技论文进行实验测试。采用平均准确率AP(AveragePrecision)来评估引文推荐的实验结果。The implementation process of the present invention selects scientific and technological papers in the field of physics for experimental testing. The average accuracy rate AP (Average Precision) is used to evaluate the experimental results of citation recommendation.
对于论文q,设xq是论文q的参考文献集合,yq是一个有序二元组集合,表示论文q的引文推荐结果。yq(i)=(A,B)为有序二元组集合yq中第i个位置的元素,其中A为论文ID,B表示该论文是否被引用,1表示被引用,0表示没有被引用。yq是对引文按照梯度渐进回归树GBRT输出值的降序方式进行排序的。采用下面式子计算yq在第k个位置上的准确率Pk(yq),k为自然数。For paper q, let xq be the reference set of paper q, and yq be an ordered set of 2-tuples, representing the citation recommendation results of paper q. yq (i)=(A,B) is the i-th element in the ordered binary set yq , where A is the ID of the paper, B indicates whether the paper is cited, 1 indicates that it is cited, and 0 indicates that it is not is quoted. yq sorts the citations in descending order of the output value of the gradient asymptotic regression tree GBRT. Use the following formula to calculate the accuracy rate Pk (yq ) of yq at the kth position, where k is a natural number.
其中,表示yq(i)中的论文是否属于论文q的参考文献集合,具体计算如下:若yq(i)中的论文属于论文q的参考文献集合,则若yq(i)中的论文不属于论文q的参考文献集合,则in, Indicates whether the paper in yq (i) belongs to the reference set of paper q, the specific calculation is as follows: If the paper in yq (i) belongs to the reference set of paper q, then If the paper in yq (i) does not belong to the reference set of paper q, then
进一步,利用下面式子计算yq的平均准确率AP(yq),其中n为二元组集合yq二元组个数。Further, use the following formula to calculate the average accuracy rate AP(yq ) of yq , where n is the number of 2-tuples in the yq set of 2-tuples.
以标题为“MoreConfiningN=1SUSYGaugeTheoriesfromNon-AbelianDuality”的论文为例,利用Lucene在数据集中进行查询获得的前10篇引文依次为(9811119,1),(9610139,1),(9804038,0),(9807222,0),(9603206,0),(9411149,1),(9607200,0),(9408155,0),(9810014,1),(9605113,0)。利用本发明的方法获得的前10篇引文依次为(9411149,1),(9407087,0),(9408099,0),(9610139,1),(9811119,1),(9510101,0),(9503179,1),(9510148,1),(9408155,0),(9602031,0)。基于Lucene的引文推荐实验结果的平均准确率约为0.29,采用本发明方法的引文推荐实验结果的平均准确率约为0.33。通过实验结果表明,本发明的引文推荐方法提高了用户获取引文的效率。另外,该引文推荐方法不涉及相似用户,因此不受限于相似用户的数量;它通过利用文献内容的知识图谱能够推荐与论文具有多层语义关联关系的文献。Taking the paper titled "MoreConfiningN=1SUSYGaugeTheoriesfromNon-AbelianDuality" as an example, the first 10 citations obtained by querying the data set using Lucene are (9811119,1),(9610139,1),(9804038,0),(9807222 ,0),(9603206,0),(9411149,1),(9607200,0),(9408155,0),(9810014,1),(9605113,0). The first 10 citations obtained by using the method of the present invention are (9411149,1), (9407087,0), (9408099,0), (9610139,1), (9811119,1), (9510101,0), ( 9503179,1),(9510148,1),(9408155,0),(9602031,0). The average accuracy rate of the experimental results of citation recommendation based on Lucene is about 0.29, and the average accuracy rate of the experimental results of citation recommendation using the method of the present invention is about 0.33. Experimental results show that the citation recommendation method of the present invention improves the efficiency for users to obtain citations. In addition, the citation recommendation method does not involve similar users, so it is not limited by the number of similar users; it can recommend documents with multi-layer semantic associations with papers by utilizing the knowledge graph of document content.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201511026567.7ACN105653706B (en) | 2015-12-31 | 2015-12-31 | A kind of multilayer quotation based on literature content knowledge mapping recommends method |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201511026567.7ACN105653706B (en) | 2015-12-31 | 2015-12-31 | A kind of multilayer quotation based on literature content knowledge mapping recommends method |
| Publication Number | Publication Date |
|---|---|
| CN105653706Atrue CN105653706A (en) | 2016-06-08 |
| CN105653706B CN105653706B (en) | 2018-04-06 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201511026567.7AExpired - Fee RelatedCN105653706B (en) | 2015-12-31 | 2015-12-31 | A kind of multilayer quotation based on literature content knowledge mapping recommends method |
| Country | Link |
|---|---|
| CN (1) | CN105653706B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106407316A (en)* | 2016-08-30 | 2017-02-15 | 北京航空航天大学 | Topic model-based software question and answer recommendation method and device |
| CN107103100A (en)* | 2017-06-10 | 2017-08-29 | 海南大学 | A kind of fault-tolerant intelligent semantic searching method based on data collection of illustrative plates, Information Atlas and knowledge mapping framework for putting into driving |
| CN107169010A (en)* | 2017-03-31 | 2017-09-15 | 北京奇艺世纪科技有限公司 | A kind of determination method and device of recommendation search keyword |
| CN107895056A (en)* | 2017-12-29 | 2018-04-10 | 百度在线网络技术(北京)有限公司 | A kind of information recommendation method, device, electronic equipment and storage medium |
| CN108304531A (en)* | 2018-01-26 | 2018-07-20 | 北京泰尔英福网络科技有限责任公司 | A kind of method for visualizing and device of Digital Object Identifier adduction relationship |
| CN108664661A (en)* | 2018-05-22 | 2018-10-16 | 武汉理工大学 | A kind of scientific paper recommendation method based on frequent theme collection preference |
| CN108763354A (en)* | 2018-05-16 | 2018-11-06 | 浙江工业大学 | A kind of academic documents recommendation method of personalization |
| CN108897887A (en)* | 2018-07-10 | 2018-11-27 | 华南师范大学 | A kind of teaching resource recommended method of knowledge based map and user's similarity |
| CN109033314A (en)* | 2018-07-18 | 2018-12-18 | 哈尔滨工业大学 | The Query method in real time and system of extensive knowledge mapping in the case of memory-limited |
| CN109063188A (en)* | 2018-08-28 | 2018-12-21 | 国信优易数据有限公司 | A kind of entity recommended method and device |
| CN109241273A (en)* | 2018-08-23 | 2019-01-18 | 云南大学 | The abstracting method of ethnic group's subject data under a kind of new media environment |
| CN109376309A (en)* | 2018-12-28 | 2019-02-22 | 北京百度网讯科技有限公司 | Method and device for document recommendation based on semantic tags |
| CN109542247A (en)* | 2018-11-14 | 2019-03-29 | 腾讯科技(深圳)有限公司 | Clause recommended method and device, electronic equipment, storage medium |
| CN109582803A (en)* | 2018-11-30 | 2019-04-05 | 广东电网有限责任公司 | The construction method and system of competitive intelligence database |
| CN109582933A (en)* | 2018-11-13 | 2019-04-05 | 北京合享智慧科技有限公司 | A kind of method and relevant apparatus of determining text novelty degree |
| CN109597878A (en)* | 2018-11-13 | 2019-04-09 | 北京合享智慧科技有限公司 | A kind of method and relevant apparatus of determining text similarity |
| CN109597879A (en)* | 2018-11-30 | 2019-04-09 | 京华信息科技股份有限公司 | One kind being based on the business conduct Relation extraction method and device of " quotation relationship " data |
| CN109815335A (en)* | 2019-01-26 | 2019-05-28 | 福州大学 | A paper domain classification method suitable for document network |
| CN110033851A (en)* | 2019-04-02 | 2019-07-19 | 腾讯科技(深圳)有限公司 | Information recommendation method, device, storage medium and server |
| CN110168541A (en)* | 2016-07-29 | 2019-08-23 | 乐威指南公司 | The system and method for eliminating word ambiguity based on static and temporal knowledge figure |
| CN110427465A (en)* | 2019-08-14 | 2019-11-08 | 北京奇艺世纪科技有限公司 | A kind of content recommendation method and device based on word knowledge mapping |
| CN110532393A (en)* | 2019-09-03 | 2019-12-03 | 腾讯科技(深圳)有限公司 | Text handling method, device and its intelligent electronic device |
| CN110674308A (en)* | 2019-08-23 | 2020-01-10 | 上海科技发展有限公司 | Scientific and technological word list expansion method, device, terminal and medium based on grammar mode |
| CN110688838A (en)* | 2019-10-08 | 2020-01-14 | 北京金山数字娱乐科技有限公司 | Idiom synonym list generation method and device |
| CN110737774A (en)* | 2018-07-03 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | Book knowledge graph construction method, book recommendation method, device, equipment and medium |
| CN111078884A (en)* | 2019-12-13 | 2020-04-28 | 北京小米智能科技有限公司 | Keyword extraction method, device and medium |
| CN111090743A (en)* | 2019-11-26 | 2020-05-01 | 华南师范大学 | Thesis recommendation method and device based on word embedding and multi-valued form concept analysis |
| CN111091454A (en)* | 2019-11-05 | 2020-05-01 | 新华智云科技有限公司 | Financial public opinion recommendation method based on knowledge graph |
| CN111460324A (en)* | 2020-06-18 | 2020-07-28 | 杭州灿八科技有限公司 | Citation recommendation method and system based on link analysis |
| CN111930891A (en)* | 2020-07-31 | 2020-11-13 | 中国平安人寿保险股份有限公司 | Retrieval text expansion method based on knowledge graph and related device |
| CN112100405A (en)* | 2020-09-23 | 2020-12-18 | 中国农业大学 | Veterinary drug residue knowledge graph construction method based on weighted LDA |
| CN112287218A (en)* | 2020-10-26 | 2021-01-29 | 安徽工业大学 | A non-coal mine document association recommendation method based on knowledge graph |
| CN112364151A (en)* | 2020-10-26 | 2021-02-12 | 西北大学 | Thesis hybrid recommendation method based on graph, quotation and content |
| CN112667773A (en)* | 2020-12-23 | 2021-04-16 | 医渡云(北京)技术有限公司 | Data acquisition method based on knowledge graph and related equipment |
| CN112925895A (en)* | 2021-03-29 | 2021-06-08 | 中国工商银行股份有限公司 | Natural language software operation and maintenance method and device |
| CN115329065A (en)* | 2022-08-15 | 2022-11-11 | 南京邮电大学 | A Method of Generating Differences in Scientific and Technological Literature Based on Citation, Diagram and Structure |
| CN115618014A (en)* | 2022-10-21 | 2023-01-17 | 上海研途标准化技术服务有限公司 | Standard document analysis management system and method applying big data technology |
| CN116244497A (en)* | 2022-12-07 | 2023-06-09 | 北京理工大学 | Cross-domain paper recommendation method based on heterogeneous data embedding |
| CN118467792A (en)* | 2024-05-31 | 2024-08-09 | 北京安和进智科技有限公司 | A scientific and technological achievement recommendation system based on artificial intelligence interaction |
| CN118551031A (en)* | 2024-07-23 | 2024-08-27 | 广州平云信息科技有限公司 | Platform content intelligent recommendation method and system based on natural language processing |
| CN118585710A (en)* | 2024-08-07 | 2024-09-03 | 杭州研趣信息技术有限公司 | A method, device, equipment and medium for recommending instruments based on multi-agent |
| CN118838872A (en)* | 2024-07-02 | 2024-10-25 | 国网江苏省电力有限公司南通供电分公司 | Associated indexing method and system based on graph database |
| CN115329065B (en)* | 2022-08-15 | 2025-10-10 | 南京邮电大学 | A method for generating differences in scientific literature based on citations, graphs and structures |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109241278B (en)* | 2018-07-18 | 2022-04-26 | 绍兴诺雷智信息科技有限公司 | Research knowledge management method and system |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030204492A1 (en)* | 2002-04-25 | 2003-10-30 | Wolf Peter P. | Method and system for retrieving documents with spoken queries |
| CN102955849A (en)* | 2012-10-29 | 2013-03-06 | 新浪技术(中国)有限公司 | Method for recommending documents based on tags and document recommending device |
| CN103729402A (en)* | 2013-11-22 | 2014-04-16 | 浙江大学 | Method for establishing mapping knowledge domain based on book catalogue |
| CN103927358A (en)* | 2014-04-15 | 2014-07-16 | 清华大学 | Text search method and system |
| US20140297644A1 (en)* | 2013-04-01 | 2014-10-02 | Tencent Technology (Shenzhen) Company Limited | Knowledge graph mining method and system |
| CN104391942A (en)* | 2014-11-25 | 2015-03-04 | 中国科学院自动化研究所 | Short text characteristic expanding method based on semantic atlas |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030204492A1 (en)* | 2002-04-25 | 2003-10-30 | Wolf Peter P. | Method and system for retrieving documents with spoken queries |
| CN102955849A (en)* | 2012-10-29 | 2013-03-06 | 新浪技术(中国)有限公司 | Method for recommending documents based on tags and document recommending device |
| US20140297644A1 (en)* | 2013-04-01 | 2014-10-02 | Tencent Technology (Shenzhen) Company Limited | Knowledge graph mining method and system |
| CN103729402A (en)* | 2013-11-22 | 2014-04-16 | 浙江大学 | Method for establishing mapping knowledge domain based on book catalogue |
| CN103927358A (en)* | 2014-04-15 | 2014-07-16 | 清华大学 | Text search method and system |
| CN104391942A (en)* | 2014-11-25 | 2015-03-04 | 中国科学院自动化研究所 | Short text characteristic expanding method based on semantic atlas |
| Title |
|---|
| JUNPENG CHEN ET AL: "Predicting Citation Counts of Papers", 《2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110168541B (en)* | 2016-07-29 | 2023-10-17 | 乐威指南公司 | System and method for word ambiguity elimination based on static and temporal knowledge graphs |
| CN110168541A (en)* | 2016-07-29 | 2019-08-23 | 乐威指南公司 | The system and method for eliminating word ambiguity based on static and temporal knowledge figure |
| CN106407316B (en)* | 2016-08-30 | 2020-05-15 | 北京航空航天大学 | Software question and answer recommendation method and device based on topic model |
| CN106407316A (en)* | 2016-08-30 | 2017-02-15 | 北京航空航天大学 | Topic model-based software question and answer recommendation method and device |
| CN107169010A (en)* | 2017-03-31 | 2017-09-15 | 北京奇艺世纪科技有限公司 | A kind of determination method and device of recommendation search keyword |
| CN107103100A (en)* | 2017-06-10 | 2017-08-29 | 海南大学 | A kind of fault-tolerant intelligent semantic searching method based on data collection of illustrative plates, Information Atlas and knowledge mapping framework for putting into driving |
| CN107895056A (en)* | 2017-12-29 | 2018-04-10 | 百度在线网络技术(北京)有限公司 | A kind of information recommendation method, device, electronic equipment and storage medium |
| CN108304531A (en)* | 2018-01-26 | 2018-07-20 | 北京泰尔英福网络科技有限责任公司 | A kind of method for visualizing and device of Digital Object Identifier adduction relationship |
| CN108763354A (en)* | 2018-05-16 | 2018-11-06 | 浙江工业大学 | A kind of academic documents recommendation method of personalization |
| CN108763354B (en)* | 2018-05-16 | 2021-04-06 | 浙江工业大学 | Personalized academic literature recommendation method |
| CN108664661A (en)* | 2018-05-22 | 2018-10-16 | 武汉理工大学 | A kind of scientific paper recommendation method based on frequent theme collection preference |
| CN108664661B (en)* | 2018-05-22 | 2021-08-17 | 武汉理工大学 | Academic paper recommendation method based on frequent theme set preference |
| CN110737774B (en)* | 2018-07-03 | 2024-05-24 | 百度在线网络技术(北京)有限公司 | Book knowledge graph construction method, book recommendation method, device, equipment and medium |
| CN110737774A (en)* | 2018-07-03 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | Book knowledge graph construction method, book recommendation method, device, equipment and medium |
| CN108897887B (en)* | 2018-07-10 | 2020-10-16 | 华南师范大学 | A teaching resource recommendation method based on knowledge graph and user similarity |
| CN108897887A (en)* | 2018-07-10 | 2018-11-27 | 华南师范大学 | A kind of teaching resource recommended method of knowledge based map and user's similarity |
| CN109033314A (en)* | 2018-07-18 | 2018-12-18 | 哈尔滨工业大学 | The Query method in real time and system of extensive knowledge mapping in the case of memory-limited |
| CN109033314B (en)* | 2018-07-18 | 2020-10-23 | 哈尔滨工业大学 | Real-time query method and system for large-scale knowledge graph under condition of limited memory |
| CN109241273A (en)* | 2018-08-23 | 2019-01-18 | 云南大学 | The abstracting method of ethnic group's subject data under a kind of new media environment |
| CN109241273B (en)* | 2018-08-23 | 2022-02-18 | 云南大学 | Method for extracting minority subject data in new media environment |
| CN109063188A (en)* | 2018-08-28 | 2018-12-21 | 国信优易数据有限公司 | A kind of entity recommended method and device |
| CN109597878A (en)* | 2018-11-13 | 2019-04-09 | 北京合享智慧科技有限公司 | A kind of method and relevant apparatus of determining text similarity |
| CN109582933A (en)* | 2018-11-13 | 2019-04-05 | 北京合享智慧科技有限公司 | A kind of method and relevant apparatus of determining text novelty degree |
| CN109542247A (en)* | 2018-11-14 | 2019-03-29 | 腾讯科技(深圳)有限公司 | Clause recommended method and device, electronic equipment, storage medium |
| CN109542247B (en)* | 2018-11-14 | 2023-03-24 | 腾讯科技(深圳)有限公司 | Sentence recommendation method and device, electronic equipment and storage medium |
| CN109597879A (en)* | 2018-11-30 | 2019-04-09 | 京华信息科技股份有限公司 | One kind being based on the business conduct Relation extraction method and device of " quotation relationship " data |
| CN109582803A (en)* | 2018-11-30 | 2019-04-05 | 广东电网有限责任公司 | The construction method and system of competitive intelligence database |
| CN109376309A (en)* | 2018-12-28 | 2019-02-22 | 北京百度网讯科技有限公司 | Method and device for document recommendation based on semantic tags |
| CN109376309B (en)* | 2018-12-28 | 2022-05-17 | 北京百度网讯科技有限公司 | Method and device for document recommendation based on semantic tags |
| US11216504B2 (en) | 2018-12-28 | 2022-01-04 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Document recommendation method and device based on semantic tag |
| CN109815335A (en)* | 2019-01-26 | 2019-05-28 | 福州大学 | A paper domain classification method suitable for document network |
| CN110033851A (en)* | 2019-04-02 | 2019-07-19 | 腾讯科技(深圳)有限公司 | Information recommendation method, device, storage medium and server |
| CN110427465B (en)* | 2019-08-14 | 2022-03-04 | 北京奇艺世纪科技有限公司 | Content recommendation method and device based on word knowledge graph |
| CN110427465A (en)* | 2019-08-14 | 2019-11-08 | 北京奇艺世纪科技有限公司 | A kind of content recommendation method and device based on word knowledge mapping |
| CN110674308A (en)* | 2019-08-23 | 2020-01-10 | 上海科技发展有限公司 | Scientific and technological word list expansion method, device, terminal and medium based on grammar mode |
| CN110532393A (en)* | 2019-09-03 | 2019-12-03 | 腾讯科技(深圳)有限公司 | Text handling method, device and its intelligent electronic device |
| CN110532393B (en)* | 2019-09-03 | 2023-09-26 | 腾讯科技(深圳)有限公司 | Text processing method and device and intelligent electronic equipment thereof |
| CN110688838A (en)* | 2019-10-08 | 2020-01-14 | 北京金山数字娱乐科技有限公司 | Idiom synonym list generation method and device |
| CN110688838B (en)* | 2019-10-08 | 2023-07-18 | 北京金山数字娱乐科技有限公司 | Idiom synonym list generation method and device |
| CN111091454A (en)* | 2019-11-05 | 2020-05-01 | 新华智云科技有限公司 | Financial public opinion recommendation method based on knowledge graph |
| CN111090743B (en)* | 2019-11-26 | 2023-05-09 | 华南师范大学 | Thesis recommendation method and device based on word embedding and multi-value form concept analysis |
| CN111090743A (en)* | 2019-11-26 | 2020-05-01 | 华南师范大学 | Thesis recommendation method and device based on word embedding and multi-valued form concept analysis |
| CN111078884A (en)* | 2019-12-13 | 2020-04-28 | 北京小米智能科技有限公司 | Keyword extraction method, device and medium |
| CN111078884B (en)* | 2019-12-13 | 2023-08-15 | 北京小米智能科技有限公司 | Keyword extraction method, device and medium |
| CN111460324A (en)* | 2020-06-18 | 2020-07-28 | 杭州灿八科技有限公司 | Citation recommendation method and system based on link analysis |
| CN111930891A (en)* | 2020-07-31 | 2020-11-13 | 中国平安人寿保险股份有限公司 | Retrieval text expansion method based on knowledge graph and related device |
| CN111930891B (en)* | 2020-07-31 | 2024-02-02 | 中国平安人寿保险股份有限公司 | Knowledge graph-based search text expansion method and related device |
| CN112100405B (en)* | 2020-09-23 | 2024-01-30 | 中国农业大学 | Veterinary drug residue knowledge graph construction method based on weighted LDA |
| CN112100405A (en)* | 2020-09-23 | 2020-12-18 | 中国农业大学 | Veterinary drug residue knowledge graph construction method based on weighted LDA |
| CN112287218A (en)* | 2020-10-26 | 2021-01-29 | 安徽工业大学 | A non-coal mine document association recommendation method based on knowledge graph |
| CN112364151A (en)* | 2020-10-26 | 2021-02-12 | 西北大学 | Thesis hybrid recommendation method based on graph, quotation and content |
| CN112667773A (en)* | 2020-12-23 | 2021-04-16 | 医渡云(北京)技术有限公司 | Data acquisition method based on knowledge graph and related equipment |
| CN112925895A (en)* | 2021-03-29 | 2021-06-08 | 中国工商银行股份有限公司 | Natural language software operation and maintenance method and device |
| CN115329065A (en)* | 2022-08-15 | 2022-11-11 | 南京邮电大学 | A Method of Generating Differences in Scientific and Technological Literature Based on Citation, Diagram and Structure |
| CN115329065B (en)* | 2022-08-15 | 2025-10-10 | 南京邮电大学 | A method for generating differences in scientific literature based on citations, graphs and structures |
| CN115618014A (en)* | 2022-10-21 | 2023-01-17 | 上海研途标准化技术服务有限公司 | Standard document analysis management system and method applying big data technology |
| CN116244497A (en)* | 2022-12-07 | 2023-06-09 | 北京理工大学 | Cross-domain paper recommendation method based on heterogeneous data embedding |
| CN118467792A (en)* | 2024-05-31 | 2024-08-09 | 北京安和进智科技有限公司 | A scientific and technological achievement recommendation system based on artificial intelligence interaction |
| CN118838872A (en)* | 2024-07-02 | 2024-10-25 | 国网江苏省电力有限公司南通供电分公司 | Associated indexing method and system based on graph database |
| CN118551031A (en)* | 2024-07-23 | 2024-08-27 | 广州平云信息科技有限公司 | Platform content intelligent recommendation method and system based on natural language processing |
| CN118585710A (en)* | 2024-08-07 | 2024-09-03 | 杭州研趣信息技术有限公司 | A method, device, equipment and medium for recommending instruments based on multi-agent |
| CN118585710B (en)* | 2024-08-07 | 2024-10-11 | 杭州研趣信息技术有限公司 | Instrument recommendation method, device, equipment and medium based on multiple intelligent agents |
| Publication number | Publication date |
|---|---|
| CN105653706B (en) | 2018-04-06 |
| Publication | Publication Date | Title |
|---|---|---|
| CN105653706B (en) | A kind of multilayer quotation based on literature content knowledge mapping recommends method | |
| CN109829104B (en) | Semantic similarity based pseudo-correlation feedback model information retrieval method and system | |
| CN108052593B (en) | A topic keyword extraction method based on topic word vector and network structure | |
| CN106997382B (en) | Automatic labeling method and system for innovative creative labels based on big data | |
| CN106777274B (en) | A kind of Chinese tour field knowledge mapping construction method and system | |
| CN104391942B (en) | Short essay eigen extended method based on semantic collection of illustrative plates | |
| CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
| CN103544242B (en) | Microblog-oriented emotion entity searching system | |
| CN103605665B (en) | Keyword based evaluation expert intelligent search and recommendation method | |
| CN104239513B (en) | A Semantic Retrieval Method for Domain Data | |
| CN101944099B (en) | Method for automatically classifying text documents by utilizing body | |
| CN105045875B (en) | Personalized search and device | |
| CN111581354A (en) | A method and system for calculating similarity of FAQ questions | |
| CN104794169B (en) | A kind of subject terminology extraction method and system based on sequence labelling model | |
| CN111950285A (en) | Intelligent automatic construction system and method of medical knowledge graph based on multimodal data fusion | |
| CN103744984B (en) | Method of retrieving documents by semantic information | |
| CN108763213A (en) | Theme feature text key word extracting method | |
| CN107247780A (en) | A kind of patent document method for measuring similarity of knowledge based body | |
| CN106776711A (en) | A kind of Chinese medical knowledge mapping construction method based on deep learning | |
| CN104298776B (en) | Search-engine results optimization system based on LDA models | |
| CN108520038B (en) | Biomedical literature retrieval method based on sequencing learning algorithm | |
| CN110633365A (en) | A hierarchical multi-label text classification method and system based on word vectors | |
| CN110888991B (en) | A segmented semantic annotation method in a weak annotation environment | |
| CN103927358A (en) | Text search method and system | |
| CN103838833A (en) | Full-text retrieval system based on semantic analysis of relevant words |
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee | Granted publication date:20180406 |