Movatterモバイル変換


[0]ホーム

URL:


CN105653706A - Multilayer quotation recommendation method based on literature content mapping knowledge domain - Google Patents

Multilayer quotation recommendation method based on literature content mapping knowledge domain
Download PDF

Info

Publication number
CN105653706A
CN105653706ACN201511026567.7ACN201511026567ACN105653706ACN 105653706 ACN105653706 ACN 105653706ACN 201511026567 ACN201511026567 ACN 201511026567ACN 105653706 ACN105653706 ACN 105653706A
Authority
CN
China
Prior art keywords
citation
research
paper
concept
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201511026567.7A
Other languages
Chinese (zh)
Other versions
CN105653706B (en
Inventor
张春霞
陈俊鹏
王森
王树良
赵小林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BITfiledCriticalBeijing Institute of Technology BIT
Priority to CN201511026567.7ApriorityCriticalpatent/CN105653706B/en
Publication of CN105653706ApublicationCriticalpatent/CN105653706A/en
Application grantedgrantedCritical
Publication of CN105653706BpublicationCriticalpatent/CN105653706B/en
Expired - Fee Relatedlegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于文献内容知识图谱的多层引文推荐方法,属于信息推荐和智能信息处理领域。本方法首先获取用户的查询需求,查询需求由需要推荐引用论文或引用文献的论文的标题和摘要的关键词构成。然后,基于文献内容的知识图谱扩展查询检索词语,知识图谱由文献的研究对象词语和研究行为词语结点,以及表示同义、近义、上下位、部分整体、并列等各种语义关系的边构成。最后,构建数据集中文献的倒排索引,选取候选引文,计算候选引文和查询的相似度,采用梯度渐进回归树来进行引文推荐。本方法基于文献内容知识图谱进行多层次的引文推荐,扩大了候选引文的范围,准确地表达了论文的研究对象和内容,提高了用户获取相关文献的效率,具有广阔的应用前景。

The invention discloses a multi-layer citation recommendation method based on a document content knowledge map, belonging to the fields of information recommendation and intelligent information processing. This method first acquires the user's query requirements, which are composed of titles and abstract keywords of papers that need to be recommended for citations or references. Then, based on the content of the literature, the knowledge graph expands the query and retrieval terms. The knowledge graph consists of the research object words and research behavior word nodes of the literature, as well as the edges representing various semantic relationships such as synonyms, near synonyms, upper and lower positions, partial wholes, and juxtapositions. constitute. Finally, construct the inverted index of the documents in the data set, select candidate citations, calculate the similarity between candidate citations and queries, and use gradient asymptotic regression trees to recommend citations. This method conducts multi-level citation recommendation based on the knowledge graph of document content, expands the scope of candidate citations, accurately expresses the research object and content of the paper, improves the efficiency of users in obtaining relevant documents, and has broad application prospects.

Description

Translated fromChinese
一种基于文献内容知识图谱的多层引文推荐方法A Multi-layer Citation Recommendation Method Based on Document Content Knowledge Graph

技术领域technical field

本发明涉及信息推荐技术领域,特别是涉及一种基于文献内容知识图谱的多层引文推荐方法。本发明在信息推荐、信息检索、网络舆情监控等领域具有广阔的应用前景。The present invention relates to the technical field of information recommendation, in particular to a multi-layer citation recommendation method based on a document content knowledge graph. The invention has broad application prospects in the fields of information recommendation, information retrieval, network public opinion monitoring and the like.

背景技术Background technique

目前,信息推荐方法可以分为三大类,基于内容的推荐、基于协同过滤的推荐、以及混合的方法。At present, information recommendation methods can be divided into three categories, content-based recommendation, collaborative filtering-based recommendation, and hybrid methods.

在基于内容的推荐方法中,首先构建推荐对象的内容特征模型和用户兴趣模型,然后计算推荐对象与用户兴趣的相似度,最后将相似度较大的推荐对象推荐给用户。推荐对象和用户模型通常采用关键词表示特征。该方法的优点是可以根据用户的历史记录来构建用户兴趣模型,反映用户的需求和偏好。其特点是,第一,推荐性能依赖于推荐对象的特征提取方法和内容特征模型,也就是依赖于推荐对象的内容特征的准确性和完整性;第二,推荐对象和用户兴趣模型基于关键词进行表示和相似度计算,停留在字符串层面,限制用户对高层次概念的认知,难以满足用户的真正需求。In the content-based recommendation method, the content feature model and user interest model of the recommended object are first constructed, then the similarity between the recommended object and the user's interest is calculated, and finally the recommended object with greater similarity is recommended to the user. Recommended objects and user models usually use keywords to represent features. The advantage of this method is that it can build a user interest model based on the user's historical records, reflecting the user's needs and preferences. Its characteristics are, first, the recommendation performance depends on the feature extraction method and content feature model of the recommended object, that is, the accuracy and completeness of the content feature of the recommended object; second, the recommended object and user interest model are based on keywords Perform representation and similarity calculations, stay at the string level, limit users' cognition of high-level concepts, and it is difficult to meet the real needs of users.

基于协同过滤的推荐方法是基于推荐对象之间的相关性或用户之间的相关性来进行推荐。基于协同过滤的推荐方法可以分为基于用户的协同推荐、基于物品的协同推荐,以及基于模型的协同推荐。该方法的优点是可以处理结构化和非结构化的复杂对象。其特点是存在稀疏性问题和冷启动问题。稀疏性问题是指对于涉及推荐对象较少的用户,在庞大的用户集中难以发现与该用户兴趣相似的用户。冷启动问题是指当新用户或者新推荐对象第一次出现在推荐系统中,系统难以获知新用户的兴趣偏好,难以对新推荐对象进行推荐。The recommendation method based on collaborative filtering is to make recommendations based on the correlation between recommended objects or the correlation between users. Recommendation methods based on collaborative filtering can be divided into user-based collaborative recommendation, item-based collaborative recommendation, and model-based collaborative recommendation. The advantage of this approach is that it can handle both structured and unstructured complex objects. It is characterized by sparsity problem and cold start problem. The sparsity problem means that for users who involve fewer recommended objects, it is difficult to find users with similar interests to the user in a huge user set. The cold start problem means that when a new user or a new recommendation object appears in the recommendation system for the first time, it is difficult for the system to know the interest preferences of the new user, and it is difficult to recommend the new recommendation object.

引文推荐是信息推荐的重要研究内容,其目的是在海量的文献中找出当前论文需要引用的论文。现有引文推荐方法主要利用文献的引用关系来进行推荐,基于关键词来表示论文的内容和用户的兴趣。Citation recommendation is an important research content of information recommendation, and its purpose is to find the papers that the current paper needs to cite in the massive literature. Existing citation recommendation methods mainly use the citation relationship of literature to recommend, and represent the content of papers and users' interests based on keywords.

发明内容Contents of the invention

本发明的目的是为了解决上述现有技术中推荐方法受限于相似用户的数量,难以检索字符不同语义相似的文献,难以检索与论文的研究对象和研究行为具有不同语义关联关系的文献,以及现有技术中的引用论文推荐结果不能很好满足用户需求的问题,提供一种基于文献内容知识图谱的多层引文推荐方法。The purpose of the present invention is to solve the problem that the recommendation method in the above-mentioned prior art is limited by the number of similar users, it is difficult to retrieve documents with different semantic similarities in characters, it is difficult to retrieve documents with different semantic associations with the research objects and research behaviors of the paper, and In the prior art, the recommendation results of cited papers cannot well meet the needs of users. A multi-layer citation recommendation method based on the knowledge graph of document content is provided.

本发明的目的是通过下述技术方案实现的。The purpose of the present invention is achieved through the following technical solutions.

一种基于文献内容知识图谱的多层引文推荐方法,包括如下步骤:A multi-layer citation recommendation method based on the knowledge map of document content, including the following steps:

步骤1,获取查询需求Step 1, get query requirements

提取需要推荐引文的论文的标题和摘要,进行词根提取(Stemming)和词形还原(Lemmatization),去掉标点符号和停用词。停用词是指不具有实际意义的词语,主要包括助词、介词、连词等。进一步,提取关键词作为搜索引擎Lucene查询需求的检索词。Extract the title and abstract of the papers that need to recommend citations, perform stemming and lemmatization, and remove punctuation marks and stop words. Stop words refer to words without practical meaning, mainly including auxiliary words, prepositions, conjunctions, etc. Further, keywords are extracted as search terms required by the search engine Lucene.

步骤2,利用文献内容的知识图谱进行查询扩展Step 2, use the knowledge graph of the document content to perform query expansion

第一,对查询需求的检索词进行扩充,利用同义词词典和近义词词典获得检索词的同义词和近义词,扩充检索词集合;First, expand the search terms required by the query, use the synonyms dictionary and synonyms dictionary to obtain synonyms and synonyms of the search terms, and expand the search term set;

第二,根据论文的标题和摘要,识别论文的研究对象词语u和研究行为词语v;Second, according to the title and abstract of the paper, identify the research object word u and the research behavior word v of the paper;

第三,利用同义词词典和近义词词典,提取论文的研究对象词语和研究行为词语的同义词和近义词,构建检索扩展词,将其添加到检索词集合中。Thirdly, use the dictionary of synonyms and dictionary of synonyms to extract the synonyms and synonyms of the research object words and research behavior words in the thesis, construct the search expansion words, and add them to the search term set.

若论文的研究对象词语u的同义词和近义词为a1,a2,…,am(m为自然数),研究行为词语v的同义词和近义词为b1,b2,…,bn(n为自然数),则构建如下的检索扩展词,其中“+”是指两个词语的连接。例如,“u+b1”是指词语u和词语b1的连接。If the synonyms and synonyms of the research object word u are a1 , a2 ,…,am (m is a natural number), the synonyms and synonyms of the research behavior word v are b1 , b2 ,…,bn (n is natural number), construct the following search expansion words, where "+" refers to the connection of two words. For example, "u+b1 " refers to the concatenation of word u and word b1 .

u+b1,u+b2,…,u+bn,u+b1 ,u+b2 ,…,u+bn ,

a1+v,a1+b1,a1+b2,…,a1+bn,a1 +v,a1 +b1 ,a1 +b2 ,…,a1 +bn ,

a2+v,a2+b1,a2+b2,…,a2+bn,a2 +v,a2 +b1 ,a2 +b2 ,…,a2 +bn ,

…,…,

am+v,am+b1,am+b2,…,am+bn.am +v,am +b1 ,am +b2 ,…,am +bn .

第四,利用知识图谱中的上下位关系子网络,提取论文的研究对象词语u和研究行为词语v的上位概念和下位概念;Fourth, use the hyponym subnetwork in the knowledge graph to extract the hypernymy and hyponym concepts of the research object word u and the research behavior word v;

若u的上位概念为c1,c2,…,cp(p为自然数),u的下位概念为d1,d2,…,dq(q为自然数),v的上位概念为e1,e2,…,es(s为自然数),v的下位概念为f1,f2,…,ft(t为自然数),则构建如下的检索扩展词:If the superordinate concept of u is c1 , c2 ,…,cp (p is a natural number), the subordinate concept of u is d1 , d2 ,…,dq (q is a natural number), and the superordinate concept of v is e1 ,e2 ,…,es (s is a natural number), and the subordinate concept of v is f1 ,f2 ,…,ft (t is a natural number), then construct the following search expansion words:

u+ej(j=1,2,…,s),u+fj(j=1,2,…,t),u+ej (j=1,2,…,s),u+fj (j=1,2,…,t),

ai+ej(i=1,2,…,m,j=1,2,…,s),ai+fj(i=1,2,…,m,j=1,2,…,t),ai +ej (i=1,2,...,m,j=1,2,...,s),ai +fj (i=1,2,...,m,j=1,2,... ,t),

ci+v(i=1,2,…,p),di+v(i=1,2,…,q),ci +v(i=1,2,...,p), di +v(i=1,2,...,q),

ci+bj(i=1,2,…,p,j=1,2,…,n),di+bj(i=1,2,…,q,j=1,2,…,n),ci +bj (i=1,2,…,p,j=1,2,…,n), di +bj (i=1,2,…,q,j=1,2,… ,n),

ci+ej(i=1,2,…,p,j=1,2,…,s),ci+fj(i=1,2,…,p,j=1,2,…,t),ci +ej (i=1,2,…,p,j=1,2,…,s),ci +fj (i =1,2,…,p,j=1,2,… ,t),

di+ej(i=1,2,…,q,j=1,2,…,s),di+fj(i=1,2,…,q,j=1,2,…,t).di +ej (i=1,2,…,q,j=1,2,…,s),di +fj (i=1,2,…,q,j=1,2,… ,t).

第五,利用知识图谱中的部分整体关系子网络,提取论文的研究对象词语u和研究行为词语v的部分概念和整体概念。若u的整体概念为g1,g2,…,go(o为自然数),u的部分概念为h1,h2,…,hr(r为自然数),v的整体概念为k1,k2,…,kw(w为自然数),v的部分概念为l1,l2,…,lz(z为自然数),则构建如下的检索扩展词:Fifth, using the part of the overall relationship sub-network in the knowledge graph to extract the partial concepts and overall concepts of the research object word u and the research behavior word v in the paper. If the overall concept of u is g1 , g2 ,…,go (o is a natural number), the partial concept of u is h1 ,h2 ,…,hr (r is a natural number), and the overall concept of v is k1 ,k2 ,…,kw (w is a natural number), part of the concept of v is l1 ,l2 ,…,lz (z is a natural number), then construct the following search expansion words:

u+kj(j=1,2,…,w),u+lj(j=1,2,…,z),u+kj (j=1,2,…,w),u+lj (j=1,2,…,z),

ai+kj(i=1,2,…,m,j=1,2,…,w),ai+lj(i=1,2,…,m,j=1,2,…,z),ai +kj (i=1,2,…,m,j=1,2,…,w),ai +lj (i=1,2,…,m,j=1,2,… ,z),

gi+v(i=1,2,…,o),hi+v(i=1,2,…,r),gi +v(i=1,2,...,o),hi +v(i=1,2,...,r),

gi+bj(i=1,2,…,o,j=1,2,…,n),hi+bj(i=1,2,…,r,j=1,2,…,n),gi +bj (i=1,2,...,o,j=1,2,...,n),hi +bj (i=1,2,...,r,j=1,2,... ,n),

gi+kj(i=1,2,…,o,j=1,2,…,w),gi+lj(i=1,2,…,o,j=1,2,…,z),gi +kj (i=1,2,...,o,j=1,2,...,w),gi +lj (i=1,2,...,o,j=1,2,... ,z),

hi+kj(i=1,2,…,r,j=1,2,…,w),hi+lj(i=1,2,…,r,j=1,2,…,z).hi +kj (i=1,2,...,r,j=1,2,...,w),hi +lj (i=1,2,...,r,j=1,2,... ,z).

第六,利用知识图谱中的并列关系子网络,提取论文的研究对象词语u和研究行为词语v的并列概念。若u的并列概念为x1,x2,…,xk1(k1为自然数),v的并列概念为y1,y2,…,yk2(k2为自然数),则构建如下的检索扩展词。Sixth, use the parallel relationship sub-network in the knowledge graph to extract the parallel concepts of the research object word u and the research behavior word v. If the parallel concept of u is x1 , x2 ,…,xk1 (k1 is a natural number), and the parallel concept of v is y1 , y2 ,…,yk2 (k2 is a natural number), then construct the following search expansion words .

u+yj(j=1,2,…,k2),xi+v(i=1,2,…,k1).u+yj (j=1,2,…,k2), xi +v(i=1,2,…,k1).

步骤3,构建文献的倒排索引Step 3, construct the inverted index of the document

根据数据集中的文献的标题和摘要构建倒排索引,包括预处理、构建索引和存储索引。预处理包括词根提取和词形还原,去掉标点符号和停用词。构建索引包括构建词语到文档的映射词典,对词语按照字典顺序排序,合并相同词语的文档映射信息,构建文档倒排链表即文档倒排索引。Construct an inverted index based on the titles and abstracts of the documents in the dataset, including preprocessing, index construction and index storage. Preprocessing includes stemming and lemmatization, removing punctuation and stop words. Building an index includes building a mapping dictionary from words to documents, sorting words in lexicographical order, merging document mapping information of the same word, and building a document inverted list, that is, a document inverted index.

步骤4,选取候选引文集Step 4, Select Candidate Citation Sets

首先,根据扩展后的检索词集合,在数据集中检索出在标题和摘要中包括任一检索词的论文。然后,计算查询与这些论文的相似度。将相似度最高的前N(N为自然数)篇论文作为候选引文集。其中,查询与论文的相似度采用搜索引擎Lucene中的向量空间模型进行计算。查询和论文由查询向量和论文向量来表示,查询和论文的相似度为查询向量和论文向量的余弦相似度。First, according to the expanded set of search terms, papers including any search term in the title and abstract are retrieved in the dataset. Then, the similarity of the query to these papers is calculated. Take the top N (N is a natural number) papers with the highest similarity as the candidate citation set. Among them, the similarity between the query and the paper is calculated using the vector space model in the search engine Lucene. Queries and papers are represented by query vectors and paper vectors, and the similarity between queries and papers is the cosine similarity between query vectors and paper vectors.

步骤5,提取候选引文与查询的相似度特征Step 5, extract the similarity features between the candidate citation and the query

候选引文与查询的相似度特征分为如下两种特征。第一种是基于搜索引擎Lucene的候选引文与查询的相似度特征。第二种是候选引文与查询的主题分布的KL距离(Kullback-LeiblerDivergence)。首先,采用隐含狄利克雷分布模型获取查询和候选引文的主题分布。然后,计算这两个主题分布的KL距离。The similarity features of candidate citations and queries are divided into the following two features. The first one is based on the similarity feature between candidate citations and queries of search engine Lucene. The second is the KL distance (Kullback-Leibler Divergence) between the candidate citation and the topic distribution of the query. First, a latent Dirichlet distribution model is employed to obtain the topic distributions of queries and candidate citations. Then, the KL distance of these two topic distributions is calculated.

步骤6,构建引文推荐的训练数据Step 6, construct training data for citation recommendation

第一,对训练数据集中每篇训练论文,根据其标题和摘要,利用搜索引擎Lucene检索出候选引文。First, for each training paper in the training data set, use the search engine Lucene to retrieve candidate citations according to its title and abstract.

第二,对于每一篇候选引文p,构建一个训练样本。训练样本特征包括候选引文p的引用次数特征、候选引文p和根据训练论文构建的查询的相似度特征。如果训练论文引用了候选引文p,则该样本的分类标签为1,否则为0。若训练论文包含m个参考文献,则可以构建m个正样本和n-m个负样本,其中n为候选引文的篇数。Second, for each candidate citation p, construct a training sample. The training sample features include the citation count feature of the candidate citation p, the similarity feature of the candidate citation p and the query constructed according to the training paper. The classification label of this sample is 1 if the training paper cites the candidate citation p, and 0 otherwise. If the training paper contains m references, m positive samples and n-m negative samples can be constructed, where n is the number of candidate citations.

步骤7,基于梯度渐进回归树进行引文推荐Step 7: Citation recommendation based on gradient progressive regression tree

第一,采用梯度渐进回归树GBRT(GradientBoostRegressionTree)来训练分类模型,实现引文推荐。分类特征包括候选引文与查询的相似度特征、论文引用次数特征。梯度渐进回归树的输出值一般为0~1之间的实数,将GBRT的输出值作为候选引文的推荐度。推荐度越大表示该候选引文分类为“推荐”的可能性就越大。进一步,将推荐度最高的M(M为自然数)篇候选引文作为当前论文的引文推荐结果;First, the gradient progressive regression tree GBRT (GradientBoostRegressionTree) is used to train the classification model and realize the citation recommendation. Classification features include the similarity features of candidate citations and queries, and the citation count features of papers. The output value of the gradient progressive regression tree is generally a real number between 0 and 1, and the output value of GBRT is used as the recommendation degree of the candidate citation. The greater the degree of recommendation, the greater the possibility that the candidate citation is classified as "recommended". Further, the M (M is a natural number) candidate citations with the highest recommendation degree are used as the citation recommendation results of the current paper;

第二,对推荐的每一篇引文p,从其标题和摘要中识别研究对象词语x和研究行为词语y。对于当前论文,构建每一篇引文p与它的多层语义关联关系。若u和v分别为当前论文的研究对象词语和研究行为词语;Second, for each recommended citation p, identify the research object word x and research behavior word y from its title and abstract. For the current paper, construct each citation p and its multi-layer semantic association. If u and v are the research object words and research behavior words of the current paper respectively;

情形1:若x为u的整体概念,或y为v的整体概念,则引文p的研究内容包括当前论文的研究内容。若x为u的部分概念,或y为v的部分概念,则当前论文的研究内容包括引文p的研究内容;Case 1: If x is the overall concept of u, or y is the overall concept of v, then the research content of the citation p includes the research content of the current paper. If x is part of the concept of u, or y is part of the concept of v, then the research content of the current paper includes the research content of the citation p;

情形2:若x为u的上位概念,或y为v的上位概念,则引文p的研究方法可应用于解决当前论文的研究问题。若x为u的下位概念,或y为v的下位概念,则当前论文的研究方法可应用于解决引文p的研究问题;Case 2: If x is a superordinate concept of u, or y is a superordinate concept of v, then the research method of citation p can be applied to solve the research problem of the current paper. If x is a subordinate concept of u, or y is a subordinate concept of v, then the research method in the current paper can be applied to solve the research problem of citation p;

情形3:若x为u的并列概念,或y为v的并列概念,则当前论文的研究方法可借鉴引文p的研究方法。Case 3: If x is a parallel concept of u, or y is a parallel concept of v, then the research method of the current paper can refer to the research method of the citation p.

至此,就完成了本方法的全部过程。So far, the entire process of the method has been completed.

有益效果Beneficial effect

本发明方法,针对现有引文推荐方法难以检索字符不同语义相似的文献、难以检索与论文的研究对象和研究行为具有不同语义关联关系的文献、受限于相似用户数量等问题,引入不同文献的内容语义关联的知识,采用一种基于文献内容知识图谱的多层引文推荐方法。该方法利用文献内容中研究对象词语和研究行为词语的各种语义关系来获取检索扩展词,基于梯度渐近回归树来进行多层次的引文推荐,提高了用户获取引文的效率。具体体现在如下方面:The method of the present invention aims at the problems that the existing citation recommendation methods are difficult to retrieve documents with different semantic similarities, difficult to retrieve documents with different semantic associations with the research objects and research behaviors of the paper, and limited by the number of similar users. The knowledge of content semantic association adopts a multi-layer citation recommendation method based on the knowledge graph of document content. This method uses various semantic relationships between the research object words and the research behavior words in the literature content to obtain the retrieval expansion words, and performs multi-level citation recommendation based on the gradient asymptotic regression tree, which improves the efficiency of users to obtain citations. Specifically reflected in the following aspects:

(1)本发明一方面通过提取论文的标题和摘要的关键词来表示论文的研究内容,另一方面通过提取论文的研究对象词语和研究行为词语来表示论文的研究内容,对论文的研究问题和研究内容进行了语义表征,更加准确地表达了论文的研究主题和内容,从而提高引文推荐的效果。(1) the present invention represents the research content of the paper by extracting the title of the paper and the keywords of the abstract on the one hand, represents the research content of the paper by extracting the research object words and the research behavior words of the paper on the other hand, to the research problem of the paper Semantic representation is carried out with the research content, and the research theme and content of the paper are more accurately expressed, thereby improving the effect of citation recommendation.

(2)利用文献内容的知识图谱来获取检索扩展词,也就是,利用论文的研究对象词语和研究行为词语的同义关系、近义关系、上下位关系、部分整体关系、并列关系来获取检索扩展词,扩大了候选引文的范围,从而解决引用文献漏检的问题和推荐系统初期的冷启动问题。(2) Use the knowledge map of the document content to obtain retrieval extension words, that is, use the synonymous relationship, near-synonymous relationship, hyponymy relationship, partial overall relationship, and parallel relationship between the research object words of the paper and the research behavior words to obtain retrieval Extended words expand the scope of candidate citations, thereby solving the problem of missing references and the initial cold start of the recommendation system.

(3)本发明采用梯度渐进回归树GBRT进行引文推荐,将引文推荐看作分类问题,每个训练样本引文的类别标签为1或0,即表示“推荐”或“不推荐”,不但保证了引文推荐结果的效果,而且保证了引文推荐方法的运行效率。(3) The present invention uses gradient progressive regression tree GBRT to carry out citation recommendation, regards citation recommendation as a classification problem, and the category label of each training sample citation is 1 or 0, which means "recommended" or "not recommended", which not only guarantees The effect of the citation recommendation results is not only guaranteed, but also the operating efficiency of the citation recommendation method is guaranteed.

(4)在文献内容的知识图谱中,可以动态添加与论文的研究对象词语和研究行为词语具有不同语义关系的词语,不断扩充文献内容的知识图谱网络,从而提高引文推荐方法的实时性和灵活性。(4) In the knowledge map of document content, words that have different semantic relations with the research object words and research behavior words of the paper can be dynamically added, and the knowledge map network of document content can be continuously expanded, thereby improving the real-time and flexibility of the citation recommendation method sex.

附图说明Description of drawings

图1为本发明方法的流程图。Fig. 1 is the flowchart of the method of the present invention.

具体实施方式detailed description

下面结合实施例对本发明方法进行详细说明。The method of the present invention will be described in detail below in conjunction with the examples.

实施例Example

一种基于文献内容知识图谱的多层引文推荐方法,包括如下步骤:A multi-layer citation recommendation method based on the knowledge map of document content, including the following steps:

步骤1,获取查询需求。Step 1, obtain query requirements.

提取需要推荐引文的论文的标题和摘要,进行词根提取(Stemming)和词形还原(Lemmatization),去掉标点符号和停用词。例如,单词“entities”通过词根提取转化为“entity”。单词“identified”通过词形还原转化为“identify”。停用词是指不具有实际意义的词语,主要包括助词、介词、连词等。例如,“is”“with”和“and”都是停用词。进一步,提取关键词作为搜索引擎Lucene查询需求的检索词。Extract the title and abstract of the papers that need to recommend citations, perform stemming and lemmatization, and remove punctuation marks and stop words. For example, the word "entities" is transformed into "entity" through stemming. The word "identified" is transformed into "identify" by lemmatization. Stop words refer to words without practical meaning, mainly including auxiliary words, prepositions, conjunctions, etc. For example, "is", "with", and "and" are all stop words. Further, keywords are extracted as search terms required by the search engine Lucene.

步骤2,利用文献内容的知识图谱进行查询扩展。Step 2, use the knowledge graph of the document content to perform query expansion.

第一,对查询需求的检索词进行扩充,利用同义词词典和近义词词典获得检索词的同义词和近义词,扩充检索扩展词集合。First, expand the search terms required by the query, use the synonyms dictionary and the synonyms dictionary to obtain synonyms and synonyms of the search terms, and expand the set of search expansion words.

例如,从标题为“一种基于隐马尔科夫模型的命名实体识别”的论文中提取关键词“隐马尔科夫模型”和“命名实体识别”作为检索词。通过同义词词典和近义词词典获得检索扩展词“HMM(隐马尔科夫模型)”和“NER(命名实体识别)”。For example, keywords "Hidden Markov Model" and "Named Entity Recognition" are extracted from a paper titled "A Hidden Markov Model-Based Named Entity Recognition" as search terms. The search expansion words "HMM (Hidden Markov Model)" and "NER (Named Entity Recognition)" are obtained through the dictionary of synonyms and the dictionary of synonyms.

第二,根据论文的标题和摘要,识别论文的研究对象词语u和研究行为词语v。例如,对于标题为“一种基于隐马尔科夫模型的命名实体识别”的论文,识别其论文的研究对象词语为“命名实体”,研究行为词语为“识别”。Second, according to the title and abstract of the paper, identify the research object word u and the research behavior word v of the paper. For example, for a paper titled "A Named Entity Recognition Based on a Hidden Markov Model", the research target word for the recognition paper is "named entity", and the research behavior word is "recognition".

第三,利用同义词词典和近义词词典,提取论文的研究对象词语和研究行为词语的同义词和近义词,构建检索扩展词,将其添加到检索词集合中。Thirdly, use the dictionary of synonyms and dictionary of synonyms to extract the synonyms and synonyms of the research object words and research behavior words in the thesis, construct the search expansion words, and add them to the search term set.

若论文的研究对象词语u的同义词和近义词为a1,a2,…,am(m为自然数),研究行为词语v的同义词和近义词为b1,b2,…,bn(n为自然数),则构建如下的检索扩展词,其中“+”是指两个词语的连接。例如,“u+b1”是指词语u和词语b1的连接。“实体+检测”是指词语“实体”和词语“检测”的连接,即“实体检测”。If the synonyms and synonyms of the research object word u are a1 , a2 ,…,am (m is a natural number), the synonyms and synonyms of the research behavior word v are b1 , b2 ,…,bn (n is natural number), construct the following search expansion words, where "+" refers to the connection of two words. For example, "u+b1 " refers to the concatenation of word u and word b1 . "Entity + detection" refers to the connection of the word "entity" and the word "detection", that is, "entity detection".

u+b1,u+b2,…,u+bn,u+b1 ,u+b2 ,…,u+bn ,

a1+v,a1+b1,a1+b2,…,a1+bn,a1 +v,a1 +b1 ,a1 +b2 ,…,a1 +bn ,

a2+v,a2+b1,a2+b2,…,a2+bn,a2 +v,a2 +b1 ,a2 +b2 ,…,a2 +bn ,

…,…,

am+v,am+b1,am+b2,…,am+bn.am +v,am +b1 ,am +b2 ,…,am +bn .

例如,对于标题为“一种基于隐马尔科夫模型的命名实体识别”的论文,提取研究行为词语“识别”的近义词为“检测”和“提取”,因此,构建检索扩展词“命名实体检测”和“命名实体提取”,并将它们添加到检索词集合中。For example, for a paper titled "A Named Entity Recognition Based on Hidden Markov Model", the synonyms of the extraction research behavior word "recognition" are "detection" and "extraction". Therefore, constructing the retrieval extension word "named entity detection " and "Named Entity Extraction" and add them to the set of terms.

第四,利用知识图谱中的上下位关系子网络,提取论文的研究对象词语u和研究行为词语v的上位概念和下位概念。Fourth, using the hyponym sub-network in the knowledge graph, the hypernymy and hyponym concepts of the research object word u and the research behavior word v are extracted.

若u的上位概念为c1,c2,…,cp(p为自然数),u的下位概念为d1,d2,…,dq(q为自然数),v的上位概念为e1,e2,…,es(s为自然数),v的下位概念为f1,f2,…,ft(t为自然数),则构建如下的检索扩展词。If the superordinate concept of u is c1 , c2 ,…,cp (p is a natural number), the subordinate concept of u is d1 , d2 ,…,dq (q is a natural number), and the superordinate concept of v is e1 ,e2 ,...,es (s is a natural number), and the subordinate concept of v is f1 ,f2 ,...,ft (t is a natural number), then construct the following search expansion words.

u+ej(j=1,2,…,s),u+fj(j=1,2,…,t),u+ej (j=1,2,…,s),u+fj (j=1,2,…,t),

ai+ej(i=1,2,…,m,j=1,2,…,s),ai+fj(i=1,2,…,m,j=1,2,…,t),ai +ej (i=1,2,...,m,j=1,2,...,s),ai +fj (i=1,2,...,m,j=1,2,... ,t),

ci+v(i=1,2,…,p),di+v(i=1,2,…,q),ci +v(i=1,2,...,p), di +v(i=1,2,...,q),

ci+bj(i=1,2,…,p,j=1,2,…,n),di+bj(i=1,2,…,q,j=1,2,…,n),ci +bj (i=1,2,…,p,j=1,2,…,n), di +bj (i=1,2,…,q,j=1,2,… ,n),

ci+ej(i=1,2,…,p,j=1,2,…,s),ci+fj(i=1,2,…,p,j=1,2,…,t),ci +ej (i=1,2,…,p,j=1,2,…,s),ci +fj (i =1,2,…,p,j=1,2,… ,t),

di+ej(i=1,2,…,q,j=1,2,…,s),di+fj(i=1,2,…,q,j=1,2,…,t).di +ej (i=1,2,…,q,j=1,2,…,s),di +fj (i=1,2,…,q,j=1,2,… ,t).

例如,对于标题为“一种基于隐马尔科夫模型的命名实体识别”的论文,提取其研究对象“命名实体”的上位概念“实体”,则可构建检索扩展词“实体识别”、“实体检测”和“实体提取”,并将它们添加到检索词集合中。For example, for a paper titled "A Named Entity Recognition Based on Hidden Markov Model", to extract the superordinate concept "entity" of its research object "named entity", you can construct the search extension words "entity recognition", "entity Detect" and "Entity Extraction" and add them to the set of terms.

第五,利用知识图谱中的部分整体关系子网络,提取论文的研究对象词语u和研究行为词语v的部分概念和整体概念。若u的整体概念为g1,g2,…,go(o为自然数),u的部分概念为h1,h2,…,hr(r为自然数),v的整体概念为k1,k2,…,kw(w为自然数),v的部分概念为l1,l2,…,lz(z为自然数),则构建如下的检索扩展词。Fifth, using the part of the overall relationship sub-network in the knowledge graph to extract the partial concepts and overall concepts of the research object word u and the research behavior word v in the paper. If the overall concept of u is g1 , g2 ,…,go (o is a natural number), the partial concept of u is h1 ,h2 ,…,hr (r is a natural number), and the overall concept of v is k1 ,k2 ,…,kw (w is a natural number), and some concepts of v are l1 ,l2 ,…,lz (z is a natural number), then construct the following search expansion words.

u+kj(j=1,2,…,w),u+lj(j=1,2,…,z),u+kj (j=1,2,…,w),u+lj (j=1,2,…,z),

ai+kj(i=1,2,…,m,j=1,2,…,w),ai+lj(i=1,2,…,m,j=1,2,…,z),ai +kj (i=1,2,…,m,j=1,2,…,w),ai +lj (i=1,2,…,m,j=1,2,… ,z),

gi+v(i=1,2,…,o),hi+v(i=1,2,…,r),gi +v(i=1,2,...,o),hi +v(i=1,2,...,r),

gi+bj(i=1,2,…,o,j=1,2,…,n),hi+bj(i=1,2,…,r,j=1,2,…,n),gi +bj (i=1,2,...,o,j=1,2,...,n),hi +bj (i=1,2,...,r,j=1,2,... ,n),

gi+kj(i=1,2,…,o,j=1,2,…,w),gi+lj(i=1,2,…,o,j=1,2,…,z),gi +kj (i=1,2,...,o,j=1,2,...,w),gi +lj (i=1,2,...,o,j=1,2,... ,z),

hi+kj(i=1,2,…,r,j=1,2,…,w),hi+lj(i=1,2,…,r,j=1,2,…,z).hi +kj (i=1,2,...,r,j=1,2,...,w),hi +lj (i=1,2,...,r,j=1,2,... ,z).

例如,对于标题为“一种基于隐马尔科夫模型的命名实体识别”的论文,提取“命名实体”的整体概念“实体信息”,则可构建检索扩展词“实体信息提取”、“实体信息识别”和“实体信息检测”,将它们添加到检索词集合中。For example, for a paper titled "A Named Entity Recognition Based on Hidden Markov Model", to extract the overall concept "entity information" of "named entity", you can construct the retrieval extension words "entity information extraction", "entity information Recognition" and "Entity Information Detection", add them to the set of search terms.

第六,利用知识图谱中的并列关系子网络,提取论文的研究对象词语u和研究行为词语v的并列概念。若u的并列概念为x1,x2,…,xk1(k1为自然数),v的并列概念为y1,y2,…,yk2(k2为自然数),则构建如下的检索扩展词。Sixth, use the parallel relationship sub-network in the knowledge graph to extract the parallel concepts of the research object word u and the research behavior word v. If the parallel concept of u is x1 , x2 ,…,xk1 (k1 is a natural number), and the parallel concept of v is y1 , y2 ,…,yk2 (k2 is a natural number), then construct the following search expansion words .

u+yj(j=1,2,…,k2),xi+v(i=1,2,…,k1).u+yj (j=1,2,…,k2), xi +v(i=1,2,…,k1).

例如,对于标题为“一种基于隐马尔科夫模型的命名实体识别”的论文,提取其研究行为词语“识别”的并列概念“链接”和“消歧”,则可构建检索扩展词“实体消歧”和“实体链接”,将它们添加到检索词集合中。For example, for a paper titled "A Named Entity Recognition Based on a Hidden Markov Model", extract the parallel concepts "link" and "disambiguation" of its research behavior word "recognition", and then construct a search extension term "entity Disambiguation" and "Entity Linking", adding them to the set of terms.

步骤3,构建文献的倒排索引。Step 3, constructing the inverted index of the document.

根据数据集中的文献的标题和摘要构建倒排索引,包括预处理、构建索引和存储索引。预处理包括词根提取和词形还原,去掉标点符号和停用词。构建索引包括构建词语到文档的映射词典,对词语按照字典顺序排序,合并相同词语的文档映射信息,构建文档倒排链表即文档倒排索引。Construct an inverted index based on the titles and abstracts of the documents in the dataset, including preprocessing, index construction and index storage. Preprocessing includes stemming and lemmatization, removing punctuation and stop words. Building an index includes building a mapping dictionary from words to documents, sorting words in lexicographical order, merging document mapping information of the same word, and building a document inverted list, that is, a document inverted index.

步骤4,选取候选引文集。Step 4, select candidate citation sets.

首先,根据扩展后的检索词集合,在数据集中检索出在标题和摘要中包括任一检索词的论文。然后,计算查询与这些论文的相似度。将相似度最高的前N(N为自然数)篇论文作为候选引文集。其中,查询与论文的相似度采用Lucene中的向量空间模型进行计算。查询和论文由查询向量和论文向量来表示,查询和论文的相似度为查询向量和论文向量的余弦相似度。First, according to the expanded set of search terms, papers including any search term in the title and abstract are retrieved in the dataset. Then, the similarity of the query to these papers is calculated. Take the top N (N is a natural number) papers with the highest similarity as the candidate citation set. Among them, the similarity between the query and the paper is calculated using the vector space model in Lucene. Queries and papers are represented by query vectors and paper vectors, and the similarity between queries and papers is the cosine similarity between query vectors and paper vectors.

步骤5,提取候选引文与查询的相似度特征。Step 5, extract the similarity features between the candidate citation and the query.

候选引文与查询的相似度特征分为如下两种特征。第一种是基于Lucene的候选引文与查询的相似度特征。第二种是候选引文与查询的主题分布的KL距离(Kullback-LeiblerDivergence)。首先,采用隐含狄利克雷分布模型获取查询和候选引文的主题分布。然后,计算这两个主题分布的KL距离。The similarity features of candidate citations and queries are divided into the following two features. The first is based on Lucene's similarity features between candidate citations and queries. The second is the KL distance (Kullback-Leibler Divergence) between the candidate citation and the topic distribution of the query. First, a latent Dirichlet distribution model is employed to obtain the topic distributions of queries and candidate citations. Then, the KL distance of these two topic distributions is calculated.

步骤6,构建引文推荐的训练数据。Step 6, construct the training data for citation recommendation.

第一,对训练数据集中每篇训练论文,根据其标题和摘要,利用搜索引擎Lucene检索出候选引文。First, for each training paper in the training data set, use the search engine Lucene to retrieve candidate citations according to its title and abstract.

第二,对于每一篇候选引文p,构建一个训练样本。训练样本特征包括候选引文p的引用次数特征、候选引文p和根据训练论文构建的查询的相似度特征。如果训练论文引用了候选引文p,则该样本的分类标签为1,否则为0。若训练论文包含m个参考文献,则可以构建m个正样本和n-m个负样本,其中n为候选引文的篇数。Second, for each candidate citation p, construct a training sample. The training sample features include the citation count feature of the candidate citation p, the similarity feature of the candidate citation p and the query constructed according to the training paper. The classification label of this sample is 1 if the training paper cites the candidate citation p, and 0 otherwise. If the training paper contains m references, m positive samples and n-m negative samples can be constructed, where n is the number of candidate citations.

步骤7,基于梯度渐进回归树进行引文推荐。Step 7: Citation recommendation based on gradient progressive regression tree.

第一,采用梯度渐进回归树GBRT(GradientBoostRegressionTree)来训练分类模型,实现引文推荐。分类特征包括候选引文与查询的相似度特征、论文引用次数特征。梯度渐进回归树的输出值一般为0~1之间的实数,将GBRT的输出值作为候选引文的推荐度。推荐度越大表示该候选引文分类为“推荐”的可能性就越大。进一步,将推荐度最高的M(M为自然数)篇候选引文作为当前论文的引文推荐结果。First, the gradient progressive regression tree GBRT (GradientBoostRegressionTree) is used to train the classification model and realize the citation recommendation. Classification features include the similarity features of candidate citations and queries, and the citation count features of papers. The output value of the gradient progressive regression tree is generally a real number between 0 and 1, and the output value of GBRT is used as the recommendation degree of the candidate citation. The greater the degree of recommendation, the greater the possibility that the candidate citation is classified as "recommended". Further, M (M is a natural number) candidate citations with the highest recommendation degree are taken as the citation recommendation results of the current paper.

第二,对推荐的每一篇引文p,从其标题和摘要中识别研究对象词语x和研究行为词语y。对于当前论文,构建每一篇引文p与它的多层语义关联关系。若u和v分别为当前论文的研究对象词语和研究行为词语;Second, for each recommended citation p, identify the research object word x and research behavior word y from its title and abstract. For the current paper, construct each citation p and its multi-layer semantic association. If u and v are the research object words and research behavior words of the current paper respectively;

情形1:若x为u的整体概念,或y为v的整体概念,则引文p的研究内容包括当前论文的研究内容。若x为u的部分概念,或y为v的部分概念,则当前论文的研究内容包括引文p的研究内容。Case 1: If x is the overall concept of u, or y is the overall concept of v, then the research content of the citation p includes the research content of the current paper. If x is a partial concept of u, or y is a partial concept of v, then the research content of the current paper includes the research content of the citation p.

情形2:若x为u的上位概念,或y为v的上位概念,则引文p的研究方法可应用于解决当前论文的研究问题。若x为u的下位概念,或y为v的下位概念,则当前论文的研究方法可应用于解决引文p的研究问题。Case 2: If x is a superordinate concept of u, or y is a superordinate concept of v, then the research method of citation p can be applied to solve the research problem of the current paper. If x is a subordinate concept of u, or y is a subordinate concept of v, then the research method in the current paper can be applied to solve the research problem of citation p.

情形3:若x为u的并列概念,或y为v的并列概念,则当前论文的研究方法可借鉴引文p的研究方法。Case 3: If x is a parallel concept of u, or y is a parallel concept of v, then the research method of the current paper can refer to the research method of the citation p.

本发明的实施过程选用物理学领域的科技论文进行实验测试。采用平均准确率AP(AveragePrecision)来评估引文推荐的实验结果。The implementation process of the present invention selects scientific and technological papers in the field of physics for experimental testing. The average accuracy rate AP (Average Precision) is used to evaluate the experimental results of citation recommendation.

对于论文q,设xq是论文q的参考文献集合,yq是一个有序二元组集合,表示论文q的引文推荐结果。yq(i)=(A,B)为有序二元组集合yq中第i个位置的元素,其中A为论文ID,B表示该论文是否被引用,1表示被引用,0表示没有被引用。yq是对引文按照梯度渐进回归树GBRT输出值的降序方式进行排序的。采用下面式子计算yq在第k个位置上的准确率Pk(yq),k为自然数。For paper q, let xq be the reference set of paper q, and yq be an ordered set of 2-tuples, representing the citation recommendation results of paper q. yq (i)=(A,B) is the i-th element in the ordered binary set yq , where A is the ID of the paper, B indicates whether the paper is cited, 1 indicates that it is cited, and 0 indicates that it is not is quoted. yq sorts the citations in descending order of the output value of the gradient asymptotic regression tree GBRT. Use the following formula to calculate the accuracy rate Pk (yq ) of yq at the kth position, where k is a natural number.

PPkk((ythe yqq))==11kkΣΣii==11kkMmythe yqq((ii))

其中,表示yq(i)中的论文是否属于论文q的参考文献集合,具体计算如下:若yq(i)中的论文属于论文q的参考文献集合,则若yq(i)中的论文不属于论文q的参考文献集合,则in, Indicates whether the paper in yq (i) belongs to the reference set of paper q, the specific calculation is as follows: If the paper in yq (i) belongs to the reference set of paper q, then If the paper in yq (i) does not belong to the reference set of paper q, then

进一步,利用下面式子计算yq的平均准确率AP(yq),其中n为二元组集合yq二元组个数。Further, use the following formula to calculate the average accuracy rate AP(yq ) of yq , where n is the number of 2-tuples in the yq set of 2-tuples.

AAPP((ythe yqq))==11nnoΣΣkk==11nnoPPkk((ythe yqq))Mmythe yqq((kk))

以标题为“MoreConfiningN=1SUSYGaugeTheoriesfromNon-AbelianDuality”的论文为例,利用Lucene在数据集中进行查询获得的前10篇引文依次为(9811119,1),(9610139,1),(9804038,0),(9807222,0),(9603206,0),(9411149,1),(9607200,0),(9408155,0),(9810014,1),(9605113,0)。利用本发明的方法获得的前10篇引文依次为(9411149,1),(9407087,0),(9408099,0),(9610139,1),(9811119,1),(9510101,0),(9503179,1),(9510148,1),(9408155,0),(9602031,0)。基于Lucene的引文推荐实验结果的平均准确率约为0.29,采用本发明方法的引文推荐实验结果的平均准确率约为0.33。通过实验结果表明,本发明的引文推荐方法提高了用户获取引文的效率。另外,该引文推荐方法不涉及相似用户,因此不受限于相似用户的数量;它通过利用文献内容的知识图谱能够推荐与论文具有多层语义关联关系的文献。Taking the paper titled "MoreConfiningN=1SUSYGaugeTheoriesfromNon-AbelianDuality" as an example, the first 10 citations obtained by querying the data set using Lucene are (9811119,1),(9610139,1),(9804038,0),(9807222 ,0),(9603206,0),(9411149,1),(9607200,0),(9408155,0),(9810014,1),(9605113,0). The first 10 citations obtained by using the method of the present invention are (9411149,1), (9407087,0), (9408099,0), (9610139,1), (9811119,1), (9510101,0), ( 9503179,1),(9510148,1),(9408155,0),(9602031,0). The average accuracy rate of the experimental results of citation recommendation based on Lucene is about 0.29, and the average accuracy rate of the experimental results of citation recommendation using the method of the present invention is about 0.33. Experimental results show that the citation recommendation method of the present invention improves the efficiency for users to obtain citations. In addition, the citation recommendation method does not involve similar users, so it is not limited by the number of similar users; it can recommend documents with multi-layer semantic associations with papers by utilizing the knowledge graph of document content.

Claims (8)

Translated fromChinese
1.一种基于文献内容知识图谱的多层引文推荐方法,其特征在于包括以下步骤:1. A multi-layer citation recommendation method based on the document content knowledge map, characterized in that it comprises the following steps:步骤1,获取查询需求;Step 1, obtain query requirements;步骤2,利用文献内容的知识图谱进行查询扩展;Step 2, use the knowledge graph of the document content to perform query expansion;步骤3,构建文献的倒排索引;Step 3, constructing the inverted index of the document;步骤4,选取候选引文集;Step 4, select candidate citation sets;步骤5,提取候选引文与查询的相似度特征;Step 5, extracting the similarity features between the candidate citation and the query;步骤6,构建引文推荐的训练数据;Step 6, constructing training data for citation recommendation;步骤7,基于梯度渐进回归树进行引文推荐。Step 7: Citation recommendation based on gradient progressive regression tree.2.根据权利要求1所述的多层引文推荐方法,其特征在于,所述步骤1中,包括:获取需要推荐引文的论文的标题和摘要,进行词根提取和词形还原,去掉标点符号和停用词;提取关键词作为搜索引擎Lucene查询需求的检索词。2. multi-layer citation recommendation method according to claim 1, is characterized in that, in described step 1, comprises: obtain the title and the abstract of the paper that needs to recommend citation, carry out root extraction and lemmatization, remove punctuation mark and Stop words; extract keywords as search terms for search engine Lucene query requirements.3.根据权利要求1所述的多层引文推荐方法,其特征在于,所述步骤2中,包括:3. The multi-layer citation recommendation method according to claim 1, characterized in that, in said step 2, comprising:第一,对查询需求的检索词进行扩充,利用同义词词典和近义词词典获得检索词的同义词和近义词,扩充检索词集合;First, expand the search terms required by the query, use the synonyms dictionary and synonyms dictionary to obtain synonyms and synonyms of the search terms, and expand the search term set;第二,根据论文的标题和摘要,识别论文的研究对象词语u和研究行为词语v;Second, according to the title and abstract of the paper, identify the research object word u and the research behavior word v of the paper;第三,利用同义词词典和近义词词典,提取论文的研究对象词语u和研究行为词语v的同义词和近义词,构建检索扩展词,将其添加到检索词集合中;Third, use the dictionary of synonyms and the dictionary of synonyms to extract the synonyms and synonyms of the research object word u and the research behavior word v in the thesis, construct a search extension word, and add it to the search term set;若u的同义词和近义词为a1,a2,…,am(m为自然数),v的同义词和近义词为b1,b2,…,bn(n为自然数),则构建如下的检索扩展词,其中“+”是指两个词语的连接;例如,“u+b1”是指词语u和词语b1的连接;“实体+检测”是指词语“实体”和词语“检测”的连接,即“实体检测”;If the synonyms and synonyms of u are a1 , a2 ,…,am (m is a natural number), and the synonyms and synonyms of v are b1 ,b2 ,…,bn (n is a natural number), then construct the following retrieval Extended words, where "+" refers to the connection of two words; for example, "u+b1 " refers to the connection of the word u and the word b1 ; "entity+detection" refers to the word "entity" and the word "detection" The connection of , that is, "entity detection";u+b1,u+b2,…,u+bn,u+b1 ,u+b2 ,…,u+bn ,a1+v,a1+b1,a1+b2,…,a1+bn,a1 +v,a1 +b1 ,a1 +b2 ,…,a1 +bn ,a2+v,a2+b1,a2+b2,…,a2+bn,a2 +v,a2 +b1 ,a2 +b2 ,…,a2 +bn ,…,…,am+v,am+b1,am+b2,…,am+bn.am +v,am +b1 ,am +b2 ,…,am +bn .第四,利用知识图谱中的上下位关系子网络,提取论文的研究对象词语u和研究行为词语v的上位概念和下位概念;Fourth, use the hyponym subnetwork in the knowledge graph to extract the hypernymy and hyponym concepts of the research object word u and the research behavior word v;若u的上位概念为c1,c2,…,cp(p为自然数),u的下位概念为d1,d2,…,dq(q为自然数),v的上位概念为e1,e2,…,es(s为自然数),v的下位概念为f1,f2,…,ft(t为自然数),则构建如下的检索扩展词:If the superordinate concept of u is c1 , c2 ,…,cp (p is a natural number), the subordinate concept of u is d1 , d2 ,…,dq (q is a natural number), and the superordinate concept of v is e1 ,e2 ,…,es (s is a natural number), and the subordinate concept of v is f1 ,f2 ,…,ft (t is a natural number), then construct the following search expansion words:u+ej(j=1,2,…,s),u+fj(j=1,2,…,t),u+ej (j=1,2,…,s),u+fj (j=1,2,…,t),ai+ej(i=1,2,…,m,j=1,2,…,s),ai+fj(i=1,2,…,m,j=1,2,…,t),ai +ej (i=1,2,...,m,j=1,2,...,s),ai +fj (i=1,2,...,m,j=1,2,... ,t),ci+v(i=1,2,…,p),di+v(i=1,2,…,q),ci +v(i=1,2,...,p), di +v(i=1,2,...,q),ci+bj(i=1,2,…,p,j=1,2,…,n),di+bj(i=1,2,…,q,j=1,2,…,n),ci +bj (i=1,2,…,p,j=1,2,…,n), di +bj (i=1,2,…,q,j=1,2,… ,n),ci+ej(i=1,2,…,p,j=1,2,…,s),ci+fj(i=1,2,…,p,j=1,2,…,t),ci +ej (i=1,2,…,p,j=1,2,…,s),ci +fj (i =1,2,…,p,j=1,2,… ,t),di+ej(i=1,2,…,q,j=1,2,…,s),di+fj(i=1,2,…,q,j=1,2,…,t).di +ej (i=1,2,…,q,j=1,2,…,s),di +fj (i=1,2,…,q,j=1,2,… ,t).第五,利用知识图谱中的部分整体关系子网络,提取论文的研究对象词语u和研究行为词语v的部分概念和整体概念;若u的整体概念为g1,g2,…,go(o为自然数),u的部分概念为h1,h2,…,hr(r为自然数),v的整体概念为k1,k2,…,kw(w为自然数),v的部分概念为l1,l2,…,lz(z为自然数),则构建如下的检索扩展词;Fifth, use the part of the overall relationship sub-network in the knowledge graph to extract the partial concepts and overall concepts of the research object word u and the research behavior word v; if the overall concept of u is g1 , g2 ,…, go ( o is a natural number), the partial concept of u is h1 , h2 ,…,hr (r is a natural number), the overall concept of v is k1 , k2 ,…,kw (w is a natural number), the part of v The concept is l1 , l2 ,...,lz (z is a natural number), then construct the following search expansion words;u+kj(j=1,2,…,w),u+lj(j=1,2,…,z),u+kj (j=1,2,…,w),u+lj (j=1,2,…,z),ai+kj(i=1,2,…,m,j=1,2,…,w),ai+lj(i=1,2,…,m,j=1,2,…,z),ai +kj (i=1,2,…,m,j=1,2,…,w),ai +lj (i=1,2,…,m,j=1,2,… ,z),gi+v(i=1,2,…,o),hi+v(i=1,2,…,r),gi +v(i=1,2,...,o),hi +v(i=1,2,...,r),gi+bj(i=1,2,…,o,j=1,2,…,n),hi+bj(i=1,2,…,r,j=1,2,…,n),gi +bj (i=1,2,...,o,j=1,2,...,n),hi +bj (i=1,2,...,r,j=1,2,... ,n),gi+kj(i=1,2,…,o,j=1,2,…,w),gi+lj(i=1,2,…,o,j=1,2,…,z),gi +kj (i=1,2,...,o,j=1,2,...,w),gi +lj (i=1,2,...,o,j=1,2,... ,z),hi+kj(i=1,2,…,r,j=1,2,…,w),hi+lj(i=1,2,…,r,j=1,2,…,z).hi +kj (i=1,2,...,r,j=1,2,...,w),hi +lj (i=1,2,...,r,j=1,2,... ,z).第六,利用知识图谱中的并列关系子网络,提取论文的研究对象词语u和研究行为词语v的并列概念;若u的并列概念为x1,x2,…,xk1(k1为自然数),v的并列概念为y1,y2,…,yk2(k2为自然数),则构建如下的检索扩展词:Sixth, use the parallel relationship subnetwork in the knowledge graph to extract the parallel concept of the research object word u and the research behavior word v; if the parallel concept of u is x1 , x2 ,…, xk1 (k1 is a natural number) , the parallel concept of v is y1 , y2 ,…, yk2 (k2 is a natural number), then construct the following search expansion words:u+yj(j=1,2,…,k2),xi+v(i=1,2,…,k1)。u+yj (j=1, 2, . . . , k2), xi +v (i=1, 2, . . . , k1).4.根据权利要求1所述的多层引文推荐方法,其特征在于,所述步骤3中,包括:4. The multi-layer citation recommendation method according to claim 1, characterized in that, in the step 3, comprising:根据数据集中的文献的标题和摘要构建倒排索引,包括预处理、构建索引和存储索引;预处理包括词根提取和词形还原,去掉标点符号和停用词;构建索引包括构建词语到文档的映射词典,对词语按照字典顺序排序,合并相同词语的文档映射信息,构建文档倒排链表即文档倒排索引。Construct an inverted index based on the title and abstract of the documents in the data set, including preprocessing, index construction and storage index; preprocessing includes root extraction and lemmatization, removing punctuation marks and stop words; index construction includes building words to documents The mapping dictionary sorts the words in lexicographical order, merges the document mapping information of the same word, and constructs the document inverted list, which is the document inverted index.5.根据权利要求1所述的多层引文推荐方法,其特征在于,所述步骤4中,包括:5. The multi-layer citation recommendation method according to claim 1, characterized in that, in the step 4, comprising:首先,根据扩展后的检索词集合,在数据集中检索出在标题和摘要中包括任一检索词的论文;然后,计算查询与这些论文的相似度;将相似度最高的前N(N为自然数)篇论文作为候选引文集;其中,查询与论文的相似度采用搜索引擎Lucene中的向量空间模型进行计算;查询和论文由查询向量和论文向量来表示,查询和论文的相似度为查询向量和论文向量的余弦相似度。First, according to the expanded set of search terms, retrieve the papers that include any search term in the title and abstract in the data set; then, calculate the similarity between the query and these papers; the top N with the highest similarity (N is a natural number ) papers as a candidate citation set; wherein, the similarity between the query and the paper is calculated using the vector space model in the search engine Lucene; the query and the paper are represented by the query vector and the paper vector, and the similarity between the query and the paper is the query vector and Cosine similarity of paper vectors.6.根据权利要求1所述的多层引文推荐方法,其特征在于,所述步骤5中,包括:6. The multi-layer citation recommendation method according to claim 1, characterized in that, in the step 5, comprising:候选引文与查询的相似度特征分为如下两种特征;第一种是基于搜索引擎Lucene的候选引文与查询的相似度特征;第二种是候选引文与查询的主题分布的KL距离(Kullback-LeiblerDivergence);首先,采用隐含狄利克雷分布模型LDA获取查询和候选引文的主题分布;然后,计算这两个主题分布的KL距离。The similarity features of candidate citations and queries are divided into the following two features; the first is the similarity feature between candidate citations and queries based on the search engine Lucene; the second is the KL distance (Kullback- LeiblerDivergence); first, the latent Dirichlet distribution model LDA is used to obtain the topic distribution of the query and candidate citations; then, the KL distance between the two topic distributions is calculated.7.根据权利要求1所述的所述的多层引文推荐方法,其特征在于,所述步骤6中,包括:7. The multi-layer citation recommendation method according to claim 1, characterized in that, in the step 6, comprising:第一,对训练数据集中每篇训练论文,根据其标题和摘要,利用搜索引擎Lucene检索出候选引文;First, for each training paper in the training data set, use the search engine Lucene to retrieve candidate citations according to its title and abstract;第二,对于每一篇候选引文p,构建一个训练样本;训练样本特征包括候选引文p的引用次数特征、根据训练论文构建的查询和候选引文p的相似度特征;如果训练论文引用了候选引文p,则该样本的分类标签为1,否则为0;若训练论文包含m个参考文献,则可以构建m个正样本和n-m个负样本,其中n为候选引文的篇数。Second, for each candidate citation p, construct a training sample; the training sample features include the citation count feature of the candidate citation p, the query constructed according to the training paper and the similarity feature of the candidate citation p; if the training paper cites the candidate citation p, the classification label of the sample is 1, otherwise it is 0; if the training paper contains m references, m positive samples and n-m negative samples can be constructed, where n is the number of candidate citations.8.根据权利要求1所述的所述的多层引文推荐方法,其特征在于,所述步骤7中,包括:8. The multi-layer citation recommendation method according to claim 1, characterized in that, in the step 7, comprising:第一,采用梯度渐进回归树GBRT来训练分类模型,实现引文推荐;分类特征包括候选引文与查询的相似度特征、论文引用次数特征;梯度渐进回归树的输出值一般为0~1之间的实数,将GBRT的输出值作为候选引文的推荐度;推荐度越大表示候选引文分类为“推荐”的可能性就越大;进一步,将推荐度最高的M(M为自然数)篇候选引文作为当前论文的引文推荐结果;First, use the gradient progressive regression tree GBRT to train the classification model and realize the citation recommendation; the classification features include the similarity feature between the candidate citation and the query, and the number of paper citations; the output value of the gradient progressive regression tree is generally between 0 and 1. is a real number, the output value of GBRT is used as the recommendation degree of the candidate citation; the higher the recommendation degree, the greater the possibility that the candidate citation is classified as "recommended"; further, the M (M is a natural number) candidate citations with the highest recommendation degree are used as Citation recommendation results for the current paper;第二,对推荐的每一篇引文p,从其标题和摘要中识别研究对象词语x和研究行为词语y;对于当前论文,构建每一篇引文p与它的多层语义关联关系;若u和v分别为当前论文的研究对象词语和研究行为词语,Second, for each recommended citation p, identify the research object word x and the research behavior word y from its title and abstract; for the current paper, construct the multi-layer semantic relationship between each citation p and it; if u and v are the research object words and research behavior words of the current paper, respectively,情形1:若x为u的整体概念,或y为v的整体概念,则引文p的研究内容包括当前论文的研究内容;若x为u的部分概念,或y为v的部分概念,则当前论文的研究内容包括引文p的研究内容;Case 1: If x is the overall concept of u, or y is the overall concept of v, then the research content of the citation p includes the research content of the current paper; if x is a partial concept of u, or y is a partial concept of v, then the current The research content of the paper includes the research content of the citation p;情形2:若x为u的上位概念,或y为v的上位概念,则引文p的研究方法可应用于解决当前论文的研究问题;若x为u的下位概念,或y为v的下位概念,则当前论文的研究方法可应用于解决引文p的研究问题;Case 2: If x is a superordinate concept of u, or y is a superordinate concept of v, then the research method of citation p can be applied to solve the research problem of the current paper; if x is a subordinate concept of u, or y is a subordinate concept of v , then the research method of the current paper can be applied to solve the research problem of citation p;情形3:若x为u的并列概念,或y为v的并列概念,则当前论文的研究方法可借鉴引文p的研究方法。Case 3: If x is a parallel concept of u, or y is a parallel concept of v, then the research method of the current paper can refer to the research method of the citation p.
CN201511026567.7A2015-12-312015-12-31A kind of multilayer quotation based on literature content knowledge mapping recommends methodExpired - Fee RelatedCN105653706B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201511026567.7ACN105653706B (en)2015-12-312015-12-31A kind of multilayer quotation based on literature content knowledge mapping recommends method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201511026567.7ACN105653706B (en)2015-12-312015-12-31A kind of multilayer quotation based on literature content knowledge mapping recommends method

Publications (2)

Publication NumberPublication Date
CN105653706Atrue CN105653706A (en)2016-06-08
CN105653706B CN105653706B (en)2018-04-06

Family

ID=56490788

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201511026567.7AExpired - Fee RelatedCN105653706B (en)2015-12-312015-12-31A kind of multilayer quotation based on literature content knowledge mapping recommends method

Country Status (1)

CountryLink
CN (1)CN105653706B (en)

Cited By (43)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106407316A (en)*2016-08-302017-02-15北京航空航天大学Topic model-based software question and answer recommendation method and device
CN107103100A (en)*2017-06-102017-08-29海南大学A kind of fault-tolerant intelligent semantic searching method based on data collection of illustrative plates, Information Atlas and knowledge mapping framework for putting into driving
CN107169010A (en)*2017-03-312017-09-15北京奇艺世纪科技有限公司A kind of determination method and device of recommendation search keyword
CN107895056A (en)*2017-12-292018-04-10百度在线网络技术(北京)有限公司A kind of information recommendation method, device, electronic equipment and storage medium
CN108304531A (en)*2018-01-262018-07-20北京泰尔英福网络科技有限责任公司A kind of method for visualizing and device of Digital Object Identifier adduction relationship
CN108664661A (en)*2018-05-222018-10-16武汉理工大学A kind of scientific paper recommendation method based on frequent theme collection preference
CN108763354A (en)*2018-05-162018-11-06浙江工业大学A kind of academic documents recommendation method of personalization
CN108897887A (en)*2018-07-102018-11-27华南师范大学A kind of teaching resource recommended method of knowledge based map and user's similarity
CN109033314A (en)*2018-07-182018-12-18哈尔滨工业大学The Query method in real time and system of extensive knowledge mapping in the case of memory-limited
CN109063188A (en)*2018-08-282018-12-21国信优易数据有限公司A kind of entity recommended method and device
CN109241273A (en)*2018-08-232019-01-18云南大学The abstracting method of ethnic group's subject data under a kind of new media environment
CN109376309A (en)*2018-12-282019-02-22北京百度网讯科技有限公司 Method and device for document recommendation based on semantic tags
CN109542247A (en)*2018-11-142019-03-29腾讯科技(深圳)有限公司Clause recommended method and device, electronic equipment, storage medium
CN109582803A (en)*2018-11-302019-04-05广东电网有限责任公司The construction method and system of competitive intelligence database
CN109582933A (en)*2018-11-132019-04-05北京合享智慧科技有限公司A kind of method and relevant apparatus of determining text novelty degree
CN109597878A (en)*2018-11-132019-04-09北京合享智慧科技有限公司A kind of method and relevant apparatus of determining text similarity
CN109597879A (en)*2018-11-302019-04-09京华信息科技股份有限公司One kind being based on the business conduct Relation extraction method and device of " quotation relationship " data
CN109815335A (en)*2019-01-262019-05-28福州大学 A paper domain classification method suitable for document network
CN110033851A (en)*2019-04-022019-07-19腾讯科技(深圳)有限公司Information recommendation method, device, storage medium and server
CN110168541A (en)*2016-07-292019-08-23乐威指南公司The system and method for eliminating word ambiguity based on static and temporal knowledge figure
CN110427465A (en)*2019-08-142019-11-08北京奇艺世纪科技有限公司A kind of content recommendation method and device based on word knowledge mapping
CN110532393A (en)*2019-09-032019-12-03腾讯科技(深圳)有限公司Text handling method, device and its intelligent electronic device
CN110674308A (en)*2019-08-232020-01-10上海科技发展有限公司Scientific and technological word list expansion method, device, terminal and medium based on grammar mode
CN110688838A (en)*2019-10-082020-01-14北京金山数字娱乐科技有限公司Idiom synonym list generation method and device
CN110737774A (en)*2018-07-032020-01-31百度在线网络技术(北京)有限公司Book knowledge graph construction method, book recommendation method, device, equipment and medium
CN111078884A (en)*2019-12-132020-04-28北京小米智能科技有限公司Keyword extraction method, device and medium
CN111090743A (en)*2019-11-262020-05-01华南师范大学Thesis recommendation method and device based on word embedding and multi-valued form concept analysis
CN111091454A (en)*2019-11-052020-05-01新华智云科技有限公司Financial public opinion recommendation method based on knowledge graph
CN111460324A (en)*2020-06-182020-07-28杭州灿八科技有限公司Citation recommendation method and system based on link analysis
CN111930891A (en)*2020-07-312020-11-13中国平安人寿保险股份有限公司Retrieval text expansion method based on knowledge graph and related device
CN112100405A (en)*2020-09-232020-12-18中国农业大学Veterinary drug residue knowledge graph construction method based on weighted LDA
CN112287218A (en)*2020-10-262021-01-29安徽工业大学 A non-coal mine document association recommendation method based on knowledge graph
CN112364151A (en)*2020-10-262021-02-12西北大学Thesis hybrid recommendation method based on graph, quotation and content
CN112667773A (en)*2020-12-232021-04-16医渡云(北京)技术有限公司Data acquisition method based on knowledge graph and related equipment
CN112925895A (en)*2021-03-292021-06-08中国工商银行股份有限公司Natural language software operation and maintenance method and device
CN115329065A (en)*2022-08-152022-11-11南京邮电大学 A Method of Generating Differences in Scientific and Technological Literature Based on Citation, Diagram and Structure
CN115618014A (en)*2022-10-212023-01-17上海研途标准化技术服务有限公司Standard document analysis management system and method applying big data technology
CN116244497A (en)*2022-12-072023-06-09北京理工大学Cross-domain paper recommendation method based on heterogeneous data embedding
CN118467792A (en)*2024-05-312024-08-09北京安和进智科技有限公司 A scientific and technological achievement recommendation system based on artificial intelligence interaction
CN118551031A (en)*2024-07-232024-08-27广州平云信息科技有限公司Platform content intelligent recommendation method and system based on natural language processing
CN118585710A (en)*2024-08-072024-09-03杭州研趣信息技术有限公司 A method, device, equipment and medium for recommending instruments based on multi-agent
CN118838872A (en)*2024-07-022024-10-25国网江苏省电力有限公司南通供电分公司Associated indexing method and system based on graph database
CN115329065B (en)*2022-08-152025-10-10南京邮电大学 A method for generating differences in scientific literature based on citations, graphs and structures

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109241278B (en)*2018-07-182022-04-26绍兴诺雷智信息科技有限公司 Research knowledge management method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20030204492A1 (en)*2002-04-252003-10-30Wolf Peter P.Method and system for retrieving documents with spoken queries
CN102955849A (en)*2012-10-292013-03-06新浪技术(中国)有限公司Method for recommending documents based on tags and document recommending device
CN103729402A (en)*2013-11-222014-04-16浙江大学Method for establishing mapping knowledge domain based on book catalogue
CN103927358A (en)*2014-04-152014-07-16清华大学Text search method and system
US20140297644A1 (en)*2013-04-012014-10-02Tencent Technology (Shenzhen) Company LimitedKnowledge graph mining method and system
CN104391942A (en)*2014-11-252015-03-04中国科学院自动化研究所Short text characteristic expanding method based on semantic atlas

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20030204492A1 (en)*2002-04-252003-10-30Wolf Peter P.Method and system for retrieving documents with spoken queries
CN102955849A (en)*2012-10-292013-03-06新浪技术(中国)有限公司Method for recommending documents based on tags and document recommending device
US20140297644A1 (en)*2013-04-012014-10-02Tencent Technology (Shenzhen) Company LimitedKnowledge graph mining method and system
CN103729402A (en)*2013-11-222014-04-16浙江大学Method for establishing mapping knowledge domain based on book catalogue
CN103927358A (en)*2014-04-152014-07-16清华大学Text search method and system
CN104391942A (en)*2014-11-252015-03-04中国科学院自动化研究所Short text characteristic expanding method based on semantic atlas

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JUNPENG CHEN ET AL: "Predicting Citation Counts of Papers", 《2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》*

Cited By (62)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110168541B (en)*2016-07-292023-10-17乐威指南公司 System and method for word ambiguity elimination based on static and temporal knowledge graphs
CN110168541A (en)*2016-07-292019-08-23乐威指南公司The system and method for eliminating word ambiguity based on static and temporal knowledge figure
CN106407316B (en)*2016-08-302020-05-15北京航空航天大学Software question and answer recommendation method and device based on topic model
CN106407316A (en)*2016-08-302017-02-15北京航空航天大学Topic model-based software question and answer recommendation method and device
CN107169010A (en)*2017-03-312017-09-15北京奇艺世纪科技有限公司A kind of determination method and device of recommendation search keyword
CN107103100A (en)*2017-06-102017-08-29海南大学A kind of fault-tolerant intelligent semantic searching method based on data collection of illustrative plates, Information Atlas and knowledge mapping framework for putting into driving
CN107895056A (en)*2017-12-292018-04-10百度在线网络技术(北京)有限公司A kind of information recommendation method, device, electronic equipment and storage medium
CN108304531A (en)*2018-01-262018-07-20北京泰尔英福网络科技有限责任公司A kind of method for visualizing and device of Digital Object Identifier adduction relationship
CN108763354A (en)*2018-05-162018-11-06浙江工业大学A kind of academic documents recommendation method of personalization
CN108763354B (en)*2018-05-162021-04-06浙江工业大学Personalized academic literature recommendation method
CN108664661A (en)*2018-05-222018-10-16武汉理工大学A kind of scientific paper recommendation method based on frequent theme collection preference
CN108664661B (en)*2018-05-222021-08-17武汉理工大学Academic paper recommendation method based on frequent theme set preference
CN110737774B (en)*2018-07-032024-05-24百度在线网络技术(北京)有限公司Book knowledge graph construction method, book recommendation method, device, equipment and medium
CN110737774A (en)*2018-07-032020-01-31百度在线网络技术(北京)有限公司Book knowledge graph construction method, book recommendation method, device, equipment and medium
CN108897887B (en)*2018-07-102020-10-16华南师范大学 A teaching resource recommendation method based on knowledge graph and user similarity
CN108897887A (en)*2018-07-102018-11-27华南师范大学A kind of teaching resource recommended method of knowledge based map and user's similarity
CN109033314A (en)*2018-07-182018-12-18哈尔滨工业大学The Query method in real time and system of extensive knowledge mapping in the case of memory-limited
CN109033314B (en)*2018-07-182020-10-23哈尔滨工业大学Real-time query method and system for large-scale knowledge graph under condition of limited memory
CN109241273A (en)*2018-08-232019-01-18云南大学The abstracting method of ethnic group's subject data under a kind of new media environment
CN109241273B (en)*2018-08-232022-02-18云南大学Method for extracting minority subject data in new media environment
CN109063188A (en)*2018-08-282018-12-21国信优易数据有限公司A kind of entity recommended method and device
CN109597878A (en)*2018-11-132019-04-09北京合享智慧科技有限公司A kind of method and relevant apparatus of determining text similarity
CN109582933A (en)*2018-11-132019-04-05北京合享智慧科技有限公司A kind of method and relevant apparatus of determining text novelty degree
CN109542247A (en)*2018-11-142019-03-29腾讯科技(深圳)有限公司Clause recommended method and device, electronic equipment, storage medium
CN109542247B (en)*2018-11-142023-03-24腾讯科技(深圳)有限公司Sentence recommendation method and device, electronic equipment and storage medium
CN109597879A (en)*2018-11-302019-04-09京华信息科技股份有限公司One kind being based on the business conduct Relation extraction method and device of " quotation relationship " data
CN109582803A (en)*2018-11-302019-04-05广东电网有限责任公司The construction method and system of competitive intelligence database
CN109376309A (en)*2018-12-282019-02-22北京百度网讯科技有限公司 Method and device for document recommendation based on semantic tags
CN109376309B (en)*2018-12-282022-05-17北京百度网讯科技有限公司 Method and device for document recommendation based on semantic tags
US11216504B2 (en)2018-12-282022-01-04Beijing Baidu Netcom Science And Technology Co., Ltd.Document recommendation method and device based on semantic tag
CN109815335A (en)*2019-01-262019-05-28福州大学 A paper domain classification method suitable for document network
CN110033851A (en)*2019-04-022019-07-19腾讯科技(深圳)有限公司Information recommendation method, device, storage medium and server
CN110427465B (en)*2019-08-142022-03-04北京奇艺世纪科技有限公司Content recommendation method and device based on word knowledge graph
CN110427465A (en)*2019-08-142019-11-08北京奇艺世纪科技有限公司A kind of content recommendation method and device based on word knowledge mapping
CN110674308A (en)*2019-08-232020-01-10上海科技发展有限公司Scientific and technological word list expansion method, device, terminal and medium based on grammar mode
CN110532393A (en)*2019-09-032019-12-03腾讯科技(深圳)有限公司Text handling method, device and its intelligent electronic device
CN110532393B (en)*2019-09-032023-09-26腾讯科技(深圳)有限公司Text processing method and device and intelligent electronic equipment thereof
CN110688838A (en)*2019-10-082020-01-14北京金山数字娱乐科技有限公司Idiom synonym list generation method and device
CN110688838B (en)*2019-10-082023-07-18北京金山数字娱乐科技有限公司Idiom synonym list generation method and device
CN111091454A (en)*2019-11-052020-05-01新华智云科技有限公司Financial public opinion recommendation method based on knowledge graph
CN111090743B (en)*2019-11-262023-05-09华南师范大学Thesis recommendation method and device based on word embedding and multi-value form concept analysis
CN111090743A (en)*2019-11-262020-05-01华南师范大学Thesis recommendation method and device based on word embedding and multi-valued form concept analysis
CN111078884A (en)*2019-12-132020-04-28北京小米智能科技有限公司Keyword extraction method, device and medium
CN111078884B (en)*2019-12-132023-08-15北京小米智能科技有限公司Keyword extraction method, device and medium
CN111460324A (en)*2020-06-182020-07-28杭州灿八科技有限公司Citation recommendation method and system based on link analysis
CN111930891A (en)*2020-07-312020-11-13中国平安人寿保险股份有限公司Retrieval text expansion method based on knowledge graph and related device
CN111930891B (en)*2020-07-312024-02-02中国平安人寿保险股份有限公司Knowledge graph-based search text expansion method and related device
CN112100405B (en)*2020-09-232024-01-30中国农业大学Veterinary drug residue knowledge graph construction method based on weighted LDA
CN112100405A (en)*2020-09-232020-12-18中国农业大学Veterinary drug residue knowledge graph construction method based on weighted LDA
CN112287218A (en)*2020-10-262021-01-29安徽工业大学 A non-coal mine document association recommendation method based on knowledge graph
CN112364151A (en)*2020-10-262021-02-12西北大学Thesis hybrid recommendation method based on graph, quotation and content
CN112667773A (en)*2020-12-232021-04-16医渡云(北京)技术有限公司Data acquisition method based on knowledge graph and related equipment
CN112925895A (en)*2021-03-292021-06-08中国工商银行股份有限公司Natural language software operation and maintenance method and device
CN115329065A (en)*2022-08-152022-11-11南京邮电大学 A Method of Generating Differences in Scientific and Technological Literature Based on Citation, Diagram and Structure
CN115329065B (en)*2022-08-152025-10-10南京邮电大学 A method for generating differences in scientific literature based on citations, graphs and structures
CN115618014A (en)*2022-10-212023-01-17上海研途标准化技术服务有限公司Standard document analysis management system and method applying big data technology
CN116244497A (en)*2022-12-072023-06-09北京理工大学Cross-domain paper recommendation method based on heterogeneous data embedding
CN118467792A (en)*2024-05-312024-08-09北京安和进智科技有限公司 A scientific and technological achievement recommendation system based on artificial intelligence interaction
CN118838872A (en)*2024-07-022024-10-25国网江苏省电力有限公司南通供电分公司Associated indexing method and system based on graph database
CN118551031A (en)*2024-07-232024-08-27广州平云信息科技有限公司Platform content intelligent recommendation method and system based on natural language processing
CN118585710A (en)*2024-08-072024-09-03杭州研趣信息技术有限公司 A method, device, equipment and medium for recommending instruments based on multi-agent
CN118585710B (en)*2024-08-072024-10-11杭州研趣信息技术有限公司Instrument recommendation method, device, equipment and medium based on multiple intelligent agents

Also Published As

Publication numberPublication date
CN105653706B (en)2018-04-06

Similar Documents

PublicationPublication DateTitle
CN105653706B (en)A kind of multilayer quotation based on literature content knowledge mapping recommends method
CN109829104B (en)Semantic similarity based pseudo-correlation feedback model information retrieval method and system
CN108052593B (en) A topic keyword extraction method based on topic word vector and network structure
CN106997382B (en) Automatic labeling method and system for innovative creative labels based on big data
CN106777274B (en)A kind of Chinese tour field knowledge mapping construction method and system
CN104391942B (en)Short essay eigen extended method based on semantic collection of illustrative plates
CN106649260B (en)Product characteristic structure tree construction method based on comment text mining
CN103544242B (en)Microblog-oriented emotion entity searching system
CN103605665B (en)Keyword based evaluation expert intelligent search and recommendation method
CN104239513B (en) A Semantic Retrieval Method for Domain Data
CN101944099B (en)Method for automatically classifying text documents by utilizing body
CN105045875B (en)Personalized search and device
CN111581354A (en) A method and system for calculating similarity of FAQ questions
CN104794169B (en)A kind of subject terminology extraction method and system based on sequence labelling model
CN111950285A (en) Intelligent automatic construction system and method of medical knowledge graph based on multimodal data fusion
CN103744984B (en)Method of retrieving documents by semantic information
CN108763213A (en)Theme feature text key word extracting method
CN107247780A (en)A kind of patent document method for measuring similarity of knowledge based body
CN106776711A (en)A kind of Chinese medical knowledge mapping construction method based on deep learning
CN104298776B (en)Search-engine results optimization system based on LDA models
CN108520038B (en)Biomedical literature retrieval method based on sequencing learning algorithm
CN110633365A (en) A hierarchical multi-label text classification method and system based on word vectors
CN110888991B (en) A segmented semantic annotation method in a weak annotation environment
CN103927358A (en)Text search method and system
CN103838833A (en)Full-text retrieval system based on semantic analysis of relevant words

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CF01Termination of patent right due to non-payment of annual fee
CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20180406


[8]ページ先頭

©2009-2025 Movatter.jp