CN118229465A

Movatterモバイル変換

Info

Publication number: CN118229465A
Application number: CN202410610670.9A
Authority: CN
Inventors: 赖培源; 李岱素; 江昊钒; 廖晓东; 蔡焕涛; 刘士雨; 李奎; 梁育玮; 孙晓麒; 黄俊铮
Original assignee: Guangdong South China Technology Transfer Center Co ltd
Current assignee: Guangdong South China Technology Transfer Center Co ltd
Priority date: 2024-05-16
Filing date: 2024-05-16
Publication date: 2024-06-21
Anticipated expiration: 2044-05-16
Also published as: CN118229465B

Abstract

The invention discloses a pre-application patent quality assessment method and system based on clustering center characterization, comprising the following steps: extracting keywords based on patent text input by a user for searching, generating a sub-data set with similar characteristics in the patent big data, and generating a central representation of the sub-data set through a clustering model; intercepting patent information to be predicted from a patent text input by a user to generate a text representation; calculating the similarity between the text representation and the central representation of the patent information to be predicted, and generating constraint information based on the similarity and the patent quality index; and training a patent quality evaluation model by using constraint information, and obtaining a multidimensional quality evaluation result for the patent input by the user. The multi-dimensional mass analysis method and the multi-dimensional mass analysis system can rapidly analyze the patent applied by the user plan while solving the problem of mass data comparison, are beneficial to improving the success rate of the user application and cultivating high-value patents, and reduce the cost of patent application of enterprises.

Description

Translated fromChinese

基于聚类中心表征的申请前专利质量评估方法及系统Pre-application patent quality assessment method and system based on cluster center representation

技术领域Technical Field

本发明涉及专利质量评估技术领域，更具体的，涉及一种基于聚类中心表征的申请前专利质量评估方法及系统。The present invention relates to the technical field of patent quality assessment, and more specifically, to a method and system for pre-application patent quality assessment based on cluster center representation.

背景技术Background technique

专利是知识产权的重要组成部分与科技创新的主要成果，其中专利数量反映的是专利的整体规模，专利质量反映的是专利的优劣程度。目前通常是通过分析专利数量来衡量一个地区的专利水平，而忽略了对专利质量的分析，其结果是片面的反映专利的真实情况。近几年专利数量呈爆炸式增长，为专利的审查和转化运用工作带来了诸多挑战，因此专利质量受到高度关注，选择科学合理的专利质量评价方法也成为学术界研究的热点问题，尤其是在海量的数据分析中，通过构建细分的小数据集进行质量评估辅助，是评估模型进行规模化应用的重要方向。Patents are an important part of intellectual property and the main result of scientific and technological innovation. The number of patents reflects the overall scale of patents, and the quality of patents reflects the quality of patents. At present, the patent level of a region is usually measured by analyzing the number of patents, while the analysis of patent quality is ignored, resulting in a one-sided reflection of the true situation of patents. In recent years, the number of patents has exploded, bringing many challenges to the review and transformation of patents. Therefore, patent quality has received great attention, and the selection of scientific and reasonable patent quality evaluation methods has also become a hot topic in academic research. Especially in the analysis of massive data, constructing segmented small data sets to assist in quality assessment is an important direction for large-scale application of evaluation models.

目前，专利申请文件数量递增较快，但专利从业人员数量不足并且专业知识良莠不齐，导致专利从业人员的工作量增加，间接导致专利申请文件质量下降。因此专利申请文件将影响到专利申请质量，提高专利申请文件的质量一方面充分表明当前企业的研发方案的保护范围，更好进行企业知识产权服务工作，另一方面也有助于提高专利申请质量。因此在申请前对专利申请文本进行多维质量评估是亟需解决的问题之一。At present, the number of patent application documents is increasing rapidly, but the number of patent practitioners is insufficient and their professional knowledge is uneven, which leads to an increase in the workload of patent practitioners and indirectly leads to a decline in the quality of patent application documents. Therefore, patent application documents will affect the quality of patent applications. Improving the quality of patent application documents can fully demonstrate the protection scope of the current enterprise's R&D plan and better provide intellectual property services for enterprises. On the other hand, it can also help improve the quality of patent applications. Therefore, multi-dimensional quality assessment of patent application texts before application is one of the problems that need to be solved urgently.

发明内容Summary of the invention

为了解决上述技术问题，本发明提出了一种基于聚类中心表征的申请前专利质量评估方法及系统。In order to solve the above technical problems, the present invention proposes a pre-application patent quality assessment method and system based on cluster center representation.

本发明第一方面提供了一种基于聚类中心表征的申请前专利质量评估方法，包括：The first aspect of the present invention provides a method for evaluating patent quality before application based on cluster center representation, comprising:

基于用户输入的专利文本提取关键词进行检索，在专利大数据中生成特征相似度符合预设标准的子数据集，并通过聚类模型生成所述子数据集的中心表征；Extract keywords from patent texts input by users for retrieval, generate sub-datasets whose feature similarity meets preset standards in patent big data, and generate central representations of the sub-datasets through clustering models;

在用户输入的专利文本中截取待预测专利信息，生成所述待预测专利信息的文本表征；Extracting the patent information to be predicted from the patent text input by the user, and generating a text representation of the patent information to be predicted;

计算待预测专利信息与中心表征的相似度，基于所述相似度结合专利质量指标生成约束信息；Calculate the similarity between the patent information to be predicted and the central representation, and generate constraint information based on the similarity combined with the patent quality index;

利用约束信息训练专利质量评估模型，通过所述专利质量评估模型对用户输入的专利获得多维质量评价结果。The patent quality assessment model is trained using constraint information, and a multi-dimensional quality evaluation result is obtained for the patent input by the user through the patent quality assessment model.

本方案中，基于用户输入的专利文本提取关键词进行检索，在专利大数据中生成特征相似符合预设标准的子数据集，具体为：In this solution, keywords are extracted from the patent text input by the user for retrieval, and sub-datasets with similar features that meet the preset standards are generated in the patent big data, specifically:

获取用户输入的专利文本进行分词预处理，生成专利文本的序列化表示，判断所述专利文本的序列化表示中词向量的词性标签，使用词性标签进行序列标注；Obtain the patent text input by the user for word segmentation preprocessing, generate a serialized representation of the patent text, determine the part-of-speech tags of the word vectors in the serialized representation of the patent text, and use the part-of-speech tags for sequence annotation;

利用Roberta对所述专利文本的序列化表示进行剪裁分块及嵌入表示，获取专利文本的嵌入向量，通过所述词性标签筛选预设短语，基于预设短语的位置特征匹配筛选对应的嵌入向量，将匹配筛选的嵌入向量进行拼接，获取拼接嵌入向量；Using Roberta to trim, segment and embed the serialized representation of the patent text, obtain an embedding vector of the patent text, filter preset phrases through the part-of-speech tags, filter corresponding embedding vectors based on position feature matching of the preset phrases, and splice the matched and filtered embedding vectors to obtain a spliced embedding vector;

在专利文本的嵌入向量中引入自注意力机制，通过自注意力权重的加权强化嵌入向量的特征，并在拼接嵌入向量及嵌入向量之间引入交叉注意力，获取拼接嵌入向量的邻域嵌入向量，进行上下文语义的强化；A self-attention mechanism is introduced into the embedding vector of the patent text. The features of the embedding vector are strengthened by weighting the self-attention weight. Cross-attention is introduced between the concatenated embedding vector and the embedded vector to obtain the neighborhood embedding vector of the concatenated embedding vector and strengthen the contextual semantics.

获取注意力编码后的嵌入向量序列及邻域嵌入向量序列，计算序列中嵌入向量及邻域嵌入向量的相似度，获取相似度符合预设相似度阈值的拼接嵌入向量进行解码，作为关键词的抽取结果；Obtain the embedded vector sequence and the neighborhood embedded vector sequence after attention encoding, calculate the similarity of the embedded vector and the neighborhood embedded vector in the sequence, obtain the concatenated embedded vector whose similarity meets the preset similarity threshold for decoding, and use it as the keyword extraction result;

根据所述关键词建立检索索引，利用所述检索索引在海量的专利大数据中进行关键词特征相似度计算，获取符合预设相似度标准的专利数据构建含有关键词的子数据集。A search index is established based on the keywords, and the keyword feature similarity is calculated in a massive amount of patent big data using the search index to obtain patent data that meets the preset similarity standard to construct a sub-dataset containing the keywords.

本方案中，通过聚类模型生成所述子数据集的中心表征，具体为：In this solution, the central representation of the sub-dataset is generated by a clustering model, specifically:

利用麻雀搜索算法对子数据集的初始聚类中心进行寻优，初始化麻雀搜索算法的参数，计算麻雀种群中的适应度值，获取最优适应度值和最差适应度值以及相对应的位置；Use the sparrow search algorithm to optimize the initial cluster center of the sub-dataset, initialize the parameters of the sparrow search algorithm, calculate the fitness value in the sparrow population, and obtain the optimal fitness value and the worst fitness value and the corresponding position;

选取发现者、加入者及侦察者并更新位置，在麻雀的位置更新过程中引入自适应的t分布变异，迭代计算适应度并更新麻雀位置，满足最大迭代次数后输出最佳麻雀位置获取聚类中心矩阵；Select the discoverer, joiner and scout and update the position. In the process of updating the sparrow's position, introduce the adaptive t-distribution variation, iteratively calculate the fitness and update the sparrow's position. After the maximum number of iterations is met, output the best sparrow position to obtain the cluster center matrix.

根据所述聚类中心矩阵获取初始聚类中心，利用欧式距离作为度量函数，将子数据集中的专利数据分配至距离最近的初始聚类中心，所有专利数据分配结束后在不同类簇中更新聚类中心；Obtaining initial cluster centers according to the cluster center matrix, using Euclidean distance as a metric function, assigning patent data in the sub-dataset to the initial cluster center closest to the data, and updating cluster centers in different clusters after all patent data are assigned;

通过迭代聚类获取子数据集的最终聚类结果，根据划分的不同类簇生成子数据集的中心表征。The final clustering result of the sub-dataset is obtained through iterative clustering, and the central representation of the sub-dataset is generated according to the different clusters divided.

本方案中，在用户输入的专利文本中截取待预测专利信息，生成所述待预测专利信息的文本表征，具体为：In this solution, the patent information to be predicted is intercepted from the patent text input by the user, and a text representation of the patent information to be predicted is generated, specifically:

根据预设段落位置及指示关键词将用户输入的专利文本进行截取生成待预测专利信息，并提取生成待预测专利信息对应专利文本的嵌入向量，划分词嵌入向量、句嵌入向量及段嵌入向量；According to the preset paragraph position and indicated keywords, the patent text input by the user is intercepted to generate the patent information to be predicted, and the embedding vector of the patent text corresponding to the patent information to be predicted is extracted and divided into word embedding vector, sentence embedding vector and paragraph embedding vector;

将待预测专利信息的嵌入向量导入双向长短期记忆网络，引入注意力机制对不同层级的嵌入向量利用正向LSTM及反向LSTM进行计算，通过隐藏层将正反向计算结果进行运算，输出待预测专利信息嵌入向量对应的语义特征；The embedded vector of the patent information to be predicted is imported into the bidirectional long short-term memory network, and the attention mechanism is introduced to calculate the embedded vectors of different levels using forward LSTM and reverse LSTM. The forward and reverse calculation results are calculated through the hidden layer to output the semantic features corresponding to the embedded vector of the patent information to be predicted;

将所述语义特征根据位置编码与待预测专利信息对应不同层级的嵌入向量进行表示匹配，生成所述待预测专利信息的文本表征。The semantic features are represented and matched with embedding vectors of different levels corresponding to the patent information to be predicted according to the position encoding to generate a text representation of the patent information to be predicted.

本方案中，基于所述相似度结合专利质量指标生成约束信息，具体为：In this solution, constraint information is generated based on the similarity combined with patent quality indicators, specifically:

计算待预测专利信息文本表征与子数据集全部中心表征及相似度，在维度对齐后计算嵌入向量之间的余弦相似度，当所述余弦相似度大于预设阈值时，则提取待预测专利信息在对应位置的语义特征，利用语义特征进行相似度修正；Calculate the text representation of the patent information to be predicted and all the central representations of the sub-datasets and their similarities, calculate the cosine similarity between the embedded vectors after dimensional alignment, and when the cosine similarity is greater than a preset threshold, extract the semantic features of the patent information to be predicted at the corresponding position, and use the semantic features to correct the similarity;

遍历待预测专利信息获取全部相似度，进行绝对值的均值计算，生成平均相似度，对所述平均相似度进行取倒数，生成约束信息之一；Traversing the patent information to be predicted to obtain all similarities, calculating the mean of the absolute values to generate an average similarity, taking the inverse of the average similarity to generate one of the constraint information;

利用大数据引擎获取专利质量评价实例，在所述专利质量评价实例中提取专利质量评价指标，对所述专利质量评价指标进行主成分分析识别关键影响因素；Using a big data engine to obtain patent quality evaluation examples, extracting patent quality evaluation indicators from the patent quality evaluation examples, and performing principal component analysis on the patent quality evaluation indicators to identify key influencing factors;

根据专利质量评价实例获取关键影响因素与专利文本之间及不同关键影响因素之间的交互关系，基于不同交互关系及关键影响因素对应的属性组建三元组，利用知识图卷积神经网络学习图结构构建知识图谱；According to the patent quality evaluation examples, the interactive relationship between key influencing factors and patent texts and between different key influencing factors is obtained, and triplets are formed based on the attributes corresponding to different interactive relationships and key influencing factors. The knowledge graph convolutional neural network is used to learn the graph structure and construct the knowledge graph.

在知识图谱中获取与关键影响因素直接相连的关系边数量计算节点的中心性，利用所述中心性表征关键影响因素的重要程度，根据重要程度选取预设数量关键影响因素，并获取对应指标变量组成约束信息。In the knowledge graph, the number of relationship edges directly connected to the key influencing factors is obtained to calculate the centrality of the node, and the centrality is used to characterize the importance of the key influencing factors. A preset number of key influencing factors are selected according to the importance, and the corresponding indicator variable composition constraint information is obtained.

本方案中，利用约束信息训练专利质量评估模型，具体为：In this solution, constraint information is used to train the patent quality assessment model, specifically:

构建专利质量评估模型，通过约束信息中不同类别专利质量指标的训练数据训练对应的编码器，利用不同专利质量指标的编码器在待预测专利信息的文本表征中提取指标特征；Construct a patent quality assessment model, train the corresponding encoders with the training data of different categories of patent quality indicators in the constraint information, and use the encoders of different patent quality indicators to extract indicator features from the text representation of the patent information to be predicted;

将待预测专利信息的指标特征结合文本表征与中心表征平均相似度的倒数输入不同的多层感知机，获取特征重要性矩阵，采用协同注意力获取特征重要性分布的注意力分布，根据加权计算获取待预测专利信息的在不同约束下的表征；The index features of the patent information to be predicted are combined with the inverse of the average similarity between the text representation and the center representation and input into different multi-layer perceptrons to obtain the feature importance matrix, and the attention distribution of the feature importance distribution is obtained by using collaborative attention. The representation of the patent information to be predicted under different constraints is obtained according to weighted calculation;

将指标特征与加权后的表征进行全连接，通过多层感知机交互输出向量，将输出向量转换为概率分布得到预测评价，利用MSE评价指标进行评分，获取待预测专利信息的质量评估结果。The indicator features are fully connected with the weighted representation, and the output vector is converted into a probability distribution through the interaction of the multi-layer perceptron to obtain the prediction evaluation. The MSE evaluation indicator is used for scoring to obtain the quality assessment result of the patent information to be predicted.

本发明第二方面还提供了一种基于聚类中心表征的申请前专利质量评估系统，该系统包括：存储器、处理器、用户交互模块、评估数据集生成模块、质量评估模块、数据存储管理模块，存储器及处理器中存储并执行基于聚类中心表征的申请前专利质量评估方法程序；The second aspect of the present invention also provides a pre-application patent quality assessment system based on cluster center representation, the system comprising: a memory, a processor, a user interaction module, an assessment data set generation module, a quality assessment module, and a data storage management module, wherein the memory and the processor store and execute a pre-application patent quality assessment method program based on cluster center representation;

用户交互模块，用于用户输入关键词组，确定评估的专利数据子集；以及输入待预测专利信息的信息，作为评估输入窗口；并将系统评估后的结果返回，为用户显示评估结果；The user interaction module is used for the user to input a keyword group to determine the patent data subset to be evaluated; and to input information of the patent information to be predicted as an evaluation input window; and to return the results of the system evaluation and display the evaluation results to the user;

评估数据集生成模块，根据用户提供的关键词组，基于专利大数据集，生成子数据集；The evaluation data set generation module generates sub-data sets based on the patent big data set according to the keyword groups provided by the user;

质量评估模块，负责基于待评估专利信息和子数据集进行质量评估；The quality assessment module is responsible for quality assessment based on the patent information and sub-datasets to be assessed;

数据存储管理模块，负责专利大数据集的存储，以及基于用户关键词组生成的专利子集的存储，便于在非实时评估任务的运行。The data storage management module is responsible for the storage of large patent data sets and the storage of patent subsets generated based on user keyword groups, which facilitates the operation of non-real-time evaluation tasks.

本发明公开了一种基于聚类中心表征的申请前专利质量评估方法及系统，包括：基于用户输入的专利文本提取关键词进行检索，在专利大数据中生成特征相似的子数据集，通过聚类模型生成所述子数据集的中心表征；在用户输入的专利文本中截取待预测专利信息，生成文本表征；计算待预测专利信息的文本表征与中心表征的相似度，基于相似度结合专利质量指标生成约束信息；利用约束信息训练专利质量评估模型，对用户输入的专利获得多维质量评价结果。本发明在解决海量数据比对的同时，快速对用户计划申请的专利进行多维度质量分析，有利于提升用户申请的成功率及培育高价值专利，降低企业申请专利的成本。The present invention discloses a patent quality assessment method and system before application based on cluster center representation, including: extracting keywords based on patent text input by users for retrieval, generating sub-datasets with similar features in patent big data, and generating center representation of the sub-datasets through clustering models; intercepting patent information to be predicted in the patent text input by users to generate text representation; calculating the similarity between the text representation and the center representation of the patent information to be predicted, and generating constraint information based on the similarity combined with patent quality indicators; using constraint information to train patent quality assessment models, and obtaining multi-dimensional quality evaluation results for patents input by users. While solving the problem of massive data comparison, the present invention quickly conducts multi-dimensional quality analysis on patents that users plan to apply for, which is conducive to improving the success rate of user applications and cultivating high-value patents, and reducing the cost of patent applications for enterprises.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1示出了本发明基于聚类中心表征的申请前专利质量评估方法的流程图；FIG1 shows a flow chart of a method for evaluating patent quality before application based on cluster center representation according to the present invention;

图2示出了本发明生成子数据集的中心表征的流程图；FIG2 shows a flow chart of the present invention for generating a central representation of a sub-dataset;

图3示出了本发明构建专利质量评估模型的流程图；FIG3 shows a flow chart of constructing a patent quality assessment model according to the present invention;

图4示出了本发明基于聚类中心表征的申请前专利质量评估系统的框图。FIG4 shows a block diagram of the pre-application patent quality assessment system based on cluster center representation of the present invention.

具体实施方式Detailed ways

为了能够更清楚地理解本发明的上述目的、特征和优点，下面结合附图和具体实施方式对本发明进行进一步的详细描述。需要说明的是，在不冲突的情况下，本申请的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above-mentioned purpose, features and advantages of the present invention, the present invention is further described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present application and the features in the embodiments can be combined with each other without conflict.

在下面的描述中阐述了很多具体细节以便于充分理解本发明，但是，本发明还可以采用其他不同于在此描述的其他方式来实施，因此，本发明的保护范围并不受下面公开的具体实施例的限制。In the following description, many specific details are set forth to facilitate a full understanding of the present invention. However, the present invention may also be implemented in other ways different from those described herein. Therefore, the protection scope of the present invention is not limited to the specific embodiments disclosed below.

图1示出了本发明基于聚类中心表征的申请前专利质量评估方法的流程图。FIG1 shows a flow chart of the method for evaluating patent quality before application based on cluster center representation of the present invention.

如图1所示，本发明第一方面提供了一种基于聚类中心表征的申请前专利质量评估方法，包括：As shown in FIG1 , the first aspect of the present invention provides a method for evaluating patent quality before application based on cluster center representation, comprising:

S102，基于用户输入的专利文本提取关键词进行检索，在专利大数据中生成特征相似度符合预设标准的子数据集，并通过聚类模型生成所述子数据集的中心表征；S102, extracting keywords from the patent text input by the user for retrieval, generating a sub-dataset whose feature similarity meets a preset standard in the patent big data, and generating a central representation of the sub-dataset through a clustering model;

S104，在用户输入的专利文本中截取待预测专利信息，生成所述待预测专利信息的文本表征；S104, extracting patent information to be predicted from the patent text input by the user, and generating a text representation of the patent information to be predicted;

S106，计算待预测专利信息与中心表征的相似度，基于所述相似度结合专利质量指标生成约束信息；S106, calculating the similarity between the patent information to be predicted and the central representation, and generating constraint information based on the similarity combined with the patent quality index;

S108，利用约束信息训练专利质量评估模型，通过所述专利质量评估模型对用户输入的专利获得多维质量评价结果。S108, using the constraint information to train a patent quality assessment model, and obtaining a multi-dimensional quality evaluation result for the patent input by the user through the patent quality assessment model.

需要说明的是，获取用户输入的专利文本进行分词、归一化、筛除停用词等预处理，生成专利文本的序列化表示，判断所述专利文本的序列化表示中词向量的词性标签，包括：介词、形容词、名词、专有名词等，使用词性标签进行序列标注；利用Roberta对所述专利文本的序列化表示进行剪裁分块及嵌入表示，使用Roberta编码后，嵌入向量之间进行互相关联，并且增强语义学习能力，更加好的联系上下文，获取不同语境下所表达的语义特征。获取专利文本的嵌入向量，通过所述词性标签筛选预设短语，基于预设短语的位置特征匹配筛选对应的嵌入向量，将匹配筛选的嵌入向量进行拼接，统一长度后，经过线性层进行维度变换，获取拼接嵌入向量；在专利文本的嵌入向量中引入自注意力机制，通过自注意力权重的加权强化嵌入向量的特征，并在拼接嵌入向量及嵌入向量之间引入交叉注意力，获取拼接嵌入向量的邻域嵌入向量，进行上下文语义的强化；获取注意力编码后的嵌入向量序列及邻域嵌入向量序列，其中，嵌入向量序列包含全局语义，邻域嵌入向量序列富含局部上下文语义，计算降维后序列中嵌入向量及邻域嵌入向量的相似度，将相似度作为重要与否的依据，获取相似度符合预设相似度阈值的拼接嵌入向量进行解码，作为关键词的抽取结果；根据所述关键词建立检索索引，利用所述检索索引在海量的专利大数据中进行关键词特征相似度计算，获取符合预设相似度标准的专利数据构建含有关键词的子数据集。It should be noted that the patent text input by the user is obtained and preprocessed by word segmentation, normalization, and stop word removal to generate a serialized representation of the patent text. The part-of-speech tags of the word vectors in the serialized representation of the patent text are determined, including prepositions, adjectives, nouns, proper nouns, etc., and the part-of-speech tags are used for sequence annotation. Roberta is used to trim, segment and embed the serialized representation of the patent text. After Roberta encoding, the embedded vectors are correlated with each other, and the semantic learning ability is enhanced, the context is better connected, and the semantic features expressed in different contexts are obtained. The embedded vector of the patent text is obtained, the preset phrases are filtered through the part-of-speech tags, the corresponding embedded vectors are filtered based on the position feature matching of the preset phrases, the matched and filtered embedded vectors are spliced, and after the length is unified, the dimension is transformed through the linear layer to obtain the spliced embedded vector; the self-attention mechanism is introduced into the embedded vector of the patent text, the characteristics of the embedded vector are weighted by the self-attention weight, and the cross attention is introduced between the spliced embedded vector and the embedded vector, the neighborhood embedded vector of the spliced embedded vector is obtained, and the context semantics are enhanced; the embedded vector sequence and the neighborhood embedded vector sequence after attention encoding are obtained, wherein the embedded vector sequence contains global semantics, and the neighborhood embedded vector sequence is rich in local context semantics, the similarity of the embedded vector and the neighborhood embedded vector in the sequence after dimensionality reduction is calculated, the similarity is used as the basis for importance, and the spliced embedded vector whose similarity meets the preset similarity threshold is obtained for decoding as the keyword extraction result; a search index is established according to the keyword, and the keyword feature similarity is calculated in the massive patent big data by using the search index, and the patent data that meets the preset similarity standard is obtained to construct a sub-dataset containing keywords.

图2示出了本发明生成子数据集的中心表征的流程图。FIG. 2 shows a flow chart of the present invention for generating a central representation of a sub-dataset.

根据本发明实施例，通过聚类模型生成所述子数据集的中心表征，具体为：According to an embodiment of the present invention, the central representation of the sub-dataset is generated by a clustering model, specifically:

S202，利用麻雀搜索算法对子数据集的初始聚类中心进行寻优，初始化麻雀搜索算法的参数，计算麻雀种群中的适应度值，获取最优适应度值和最差适应度值以及相对应的位置；S202, using the sparrow search algorithm to optimize the initial cluster center of the sub-data set, initialize the parameters of the sparrow search algorithm, calculate the fitness value in the sparrow population, and obtain the optimal fitness value and the worst fitness value and the corresponding position;

S204，选取发现者、加入者及侦察者并更新位置，在麻雀的位置更新过程中引入自适应的t分布变异，迭代计算适应度并更新麻雀位置，满足最大迭代次数后输出最佳麻雀位置获取聚类中心矩阵；S204, selecting a discoverer, a joiner and a scout and updating their positions, introducing an adaptive t-distribution variation in the process of updating the sparrow's position, iteratively calculating the fitness and updating the sparrow's position, and outputting the best sparrow position to obtain the cluster center matrix after the maximum number of iterations is met;

S206，根据所述聚类中心矩阵获取初始聚类中心，利用欧式距离作为度量函数，将子数据集中的专利数据分配至距离最近的初始聚类中心，所有专利数据分配结束后在不同类簇中更新聚类中心；S206, obtaining initial cluster centers according to the cluster center matrix, using Euclidean distance as a metric function, assigning patent data in the sub-dataset to the initial cluster center closest to the data, and updating cluster centers in different clusters after all patent data are assigned;

S208，通过迭代聚类获取子数据集的最终聚类结果，根据划分的不同类簇生成子数据集的中心表征。S208, obtaining the final clustering result of the sub-dataset through iterative clustering, and generating the central representation of the sub-dataset according to the different divided clusters.

需要说明的是，K-means聚类算法能够提高运行速度，但是随机选取聚类中心时中心点过于集中或分散会导致聚类效果不佳，影响聚类后中心表征的准确度。麻雀搜索算法具备收敛速度快等优点，改善初始聚类中心对聚类结果的影响。初始化麻雀搜索算法的参数，设定最大迭代次数、种群规模、发现者数量、警戒者数量及报警值。发现者由位置最好的麻雀组成，剩余麻雀则为跟随者，警戒者将会随机产生，麻雀的适应度越高，代表了麻雀获得食物的优先级则越高。通过麻雀搜索算法以及输入的聚类个数，找到适应度最好的聚类中心矩阵，为了避免聚类算法陷入局部最优，对麻雀位置进行自适应的t分布变异，t分布结合柯西分布和高斯分布的特点，平衡了全局勘探能力和局部开发能力，获取当前最新位置，若当前最新位置优于之前的最优位置则更新位置矩阵，直到输出最佳位置矩阵获取聚类中心矩阵。经常地，聚类后不止一个中心表征，比如聚类分析后得到N个类别，对应的有N个中心表征。It should be noted that the K-means clustering algorithm can improve the running speed, but when the cluster center is randomly selected, the center point is too concentrated or dispersed, which will lead to poor clustering effect and affect the accuracy of the center representation after clustering. The sparrow search algorithm has the advantages of fast convergence speed and improves the influence of the initial cluster center on the clustering result. Initialize the parameters of the sparrow search algorithm, set the maximum number of iterations, population size, number of discoverers, number of alerts and alarm value. The discoverers are composed of the sparrows with the best position, and the remaining sparrows are followers. The alerts will be randomly generated. The higher the fitness of the sparrow, the higher the priority of the sparrow to obtain food. Through the sparrow search algorithm and the input number of clusters, the cluster center matrix with the best fitness is found. In order to avoid the clustering algorithm from falling into the local optimum, the sparrow position is adaptively mutated by t distribution. The t distribution combines the characteristics of Cauchy distribution and Gaussian distribution, balances the global exploration ability and local development ability, and obtains the current latest position. If the current latest position is better than the previous optimal position, the position matrix is updated until the best position matrix is output to obtain the cluster center matrix. Often, there is more than one central representation after clustering. For example, after clustering analysis, N categories are obtained, corresponding to N central representations.

需要说明的是，根据权利要求或说明书摘要等预设段落位置及指示关键词将用户输入的专利文本进行截取生成待预测专利信息，并提取生成待预测专利信息对应专利文本的嵌入向量，划分词嵌入向量、句嵌入向量及段嵌入向量，划分不同位置及层级的嵌入向量可以改善文本语义识别的效率，将待预测专利信息的嵌入向量导入双向长短期记忆网络，引入注意力机制对不同层级的嵌入向量利用正向LSTM及反向LSTM进行计算，正向LSTM对t时刻输入的嵌入向量与t-1时刻的输出进行正向运算，得到t时刻的正向输出，反向LSTM对t时刻输入的嵌入向量与t+1时刻的输出进行反向运算，得到t时刻的反向输出，通过隐藏层将正反向计算结果进行运算，输出待预测专利信息嵌入向量对应的语义特征；将所述语义特征根据位置编码与待预测专利信息对应不同层级的嵌入向量进行表示匹配，生成所述待预测专利信息的文本表征。It should be noted that, according to the preset paragraph positions and indicative keywords such as the claims or the abstract of the specification, the patent text input by the user is intercepted to generate the patent information to be predicted, and the embedding vector of the patent text corresponding to the patent information to be predicted is extracted and generated, and the word embedding vector, sentence embedding vector and paragraph embedding vector are divided. The division of embedding vectors at different positions and levels can improve the efficiency of text semantic recognition, and the embedding vector of the patent information to be predicted is introduced into a bidirectional long short-term memory network, and the attention mechanism is introduced to calculate the embedding vectors of different levels using forward LSTM and reverse LSTM. The forward LSTM performs a forward operation on the embedding vector input at time t and the output at time t-1 to obtain the forward output at time t, and the reverse LSTM performs a reverse operation on the embedding vector input at time t and the output at time t+1 to obtain the reverse output at time t. The forward and reverse calculation results are calculated through the hidden layer, and the semantic features corresponding to the embedding vector of the patent information to be predicted are output; the semantic features are represented and matched with the embedding vectors of different levels corresponding to the patent information to be predicted according to the position encoding to generate the text representation of the patent information to be predicted.

图3示出了本发明构建专利质量评估模型的流程图。FIG3 shows a flow chart of constructing a patent quality assessment model according to the present invention.

根据本发明实施例，利用约束信息训练专利质量评估模型，具体为：According to an embodiment of the present invention, the patent quality assessment model is trained using constraint information, specifically:

S302，构建专利质量评估模型，通过约束信息中不同类别专利质量指标的训练数据训练对应的编码器，利用不同专利质量指标的编码器在待预测专利信息的文本表征中提取指标特征；S302, constructing a patent quality assessment model, training corresponding encoders with training data of different categories of patent quality indicators in the constraint information, and using encoders of different patent quality indicators to extract indicator features from the text representation of the patent information to be predicted;

S304，将待预测专利信息的指标特征结合文本表征与中心表征平均相似度的倒数输入不同的多层感知机，获取特征重要性矩阵，采用协同注意力获取特征重要性分布的注意力分布，根据加权计算获取待预测专利信息的在不同约束下的表征；S304, inputting the index features of the patent information to be predicted into different multi-layer perceptrons in combination with the inverse of the average similarity between the text representation and the center representation, obtaining a feature importance matrix, using collaborative attention to obtain the attention distribution of the feature importance distribution, and obtaining the representation of the patent information to be predicted under different constraints according to weighted calculation;

S306，将指标特征与加权后的表征进行全连接，通过多层感知机交互输出向量，将输出向量转换为概率分布得到预测评价，利用MSE评价指标进行评分，获取待预测专利信息的质量评估结果。S306, fully connect the indicator features with the weighted representation, output vectors through multi-layer perceptron interaction, convert the output vectors into probability distribution to obtain prediction evaluation, score using MSE evaluation index, and obtain the quality evaluation result of the patent information to be predicted.

需要说明的是，计算待预测专利信息文本表征与子数据集全部中心表征及相似度，在维度对齐后计算嵌入向量之间的余弦相似度，当所述余弦相似度大于预设阈值时，则提取待预测专利信息在对应位置的语义特征，利用语义特征进行相似度修正；遍历待预测专利信息获取全部相似度，进行绝对值的均值计算，生成平均相似度，对所述平均相似度进行取倒数，能够一定程度的反应授权的概率，生成约束信息中的质量评价指标之一；利用大数据引擎获取专利质量评价实例，在所述专利质量评价实例中提取专利质量评价指标，例如权利要求的数量及长度、专利引用的数量、非专利文献的引用数量、技术生命周期、专利类、专利家族数量、发明人及申请人数量等，对所述专利质量评价指标进行主成分分析识别关键影响因素；根据专利质量评价实例获取关键影响因素与专利文本之间及不同关键影响因素之间的交互关系，基于不同交互关系及关键影响因素对应的属性组建三元组，利用知识图卷积神经网络学习图结构构建知识图谱；根据知识图谱的特性，一个节点与之相连的节点越多，该节点隐含的信息可能越丰富，在知识图谱中获取与关键影响因素直接相连的关系边数量计算节点的中心性，利用所述中心性表征关键影响因素的重要程度，更中心的节点将比其他节点更重要。根据重要程度选取预设数量关键影响因素，并获取对应指标变量组成约束信息。It should be noted that the text representation of the patent information to be predicted and all the central representations and similarities of the sub-datasets are calculated, and the cosine similarity between the embedded vectors is calculated after dimensional alignment. When the cosine similarity is greater than the preset threshold, the semantic features of the patent information to be predicted at the corresponding position are extracted, and the similarity is corrected using the semantic features; the patent information to be predicted is traversed to obtain all similarities, the mean of the absolute values is calculated, and the average similarity is generated. The inverse of the average similarity can reflect the probability of authorization to a certain extent, and one of the quality evaluation indicators in the constraint information is generated; a patent quality evaluation instance is obtained using a big data engine, and patent quality evaluation indicators are extracted from the patent quality evaluation instance, such as the number and length of claims, the number of patent citations, and the number of citations of non-patent literature. The patent quality evaluation index is analyzed by principal component analysis to identify key influencing factors using quantity, technology life cycle, patent category, number of patent families, number of inventors and applicants, etc.; the interactive relationship between key influencing factors and patent texts and between different key influencing factors is obtained based on patent quality evaluation examples, and triples are formed based on different interactive relationships and attributes corresponding to key influencing factors. The knowledge graph convolutional neural network is used to learn the graph structure to build a knowledge graph; according to the characteristics of the knowledge graph, the more nodes a node is connected to, the richer the implicit information of the node may be. The number of relationship edges directly connected to the key influencing factors in the knowledge graph is obtained to calculate the centrality of the node, and the centrality is used to characterize the importance of the key influencing factors. The more central nodes will be more important than other nodes. A preset number of key influencing factors are selected according to the importance, and the corresponding indicator variable composition constraint information is obtained.

构建专利质量评估模型，利用多尺度编码器模块获取在待预测专利信息的文本表征中提取指标特征，得到文本表征在约束信息中不同专利评估指标变量下的表示，利用协同注意力机制来估计不同表示的不同重要性，计算公式为：，其中/>表示注意力分布，/>表示文本表征在第j个、第n个专利评估指标变量对应的表示，m表示表示总数。将注意力机制前的指标特征与注意力机制后指标表示分别进行对应连接，经过多层感知机交互网络输出结果，利用MSE评价指标进行评分，获取待预测专利信息的质量评估结果，MSE评价指标越小，质量评估模型输出的预测值接近真实值，则证明待预测专利信息的质量越好。A patent quality assessment model is constructed. The multi-scale encoder module is used to extract index features from the text representation of the patent information to be predicted, and the representation of the text representation under different patent evaluation index variables in the constraint information is obtained. The collaborative attention mechanism is used to estimate the different importance of different representations. The calculation formula is: , where/> represents the attention distribution,/> It represents the representation of the text representation corresponding to the jth and nth patent evaluation indicator variables, and m represents the total number. The indicator features before the attention mechanism are connected to the indicator representation after the attention mechanism, and the results are output through the multi-layer perceptron interaction network. The MSE evaluation index is used for scoring to obtain the quality evaluation results of the patent information to be predicted. The smaller the MSE evaluation index is, the closer the predicted value output by the quality evaluation model is to the true value, which proves that the quality of the patent information to be predicted is better.

需要说明的是，获取企业专利文本的历史质量评估结果，根据所述历史质量评估结果构建企业专利文本的撰写画像，基于不同时间段的撰写画像构建个性化数据库，在个性化数据库中获取撰写画像中与其他质量评价指标差距较大的质量评价指标进行标记，根据标记的质量评价指标在当前撰写工作流中生成专利文本撰写的改进方向，利用所述改进方向基于蚁群算法进行溯源，根据溯源路径获取异常质量评价指标的影响因素，根据影响因素检索优化措施改进专利文本及技术交底文件的撰写流程，更新对应额撰写工作流。It should be noted that the historical quality assessment results of the enterprise's patent texts are obtained, and a writing portrait of the enterprise's patent texts is constructed based on the historical quality assessment results. A personalized database is constructed based on the writing portraits of different time periods. The quality evaluation indicators in the writing portraits that are significantly different from other quality evaluation indicators are obtained and marked in the personalized database. According to the marked quality evaluation indicators, improvement directions for patent text writing are generated in the current writing workflow. The improvement directions are traced based on the ant colony algorithm using the improvement directions, and the influencing factors of abnormal quality evaluation indicators are obtained according to the tracing path. According to the influencing factors, optimization measures are retrieved to improve the writing process of patent texts and technical briefing documents, and the corresponding writing workflow is updated.

本发明第二方面还提供了一种基于聚类中心表征的申请前专利质量评估系统4，该系统包括：存储器41、处理器42、用户交互模块43、评估数据集生成模块44、质量评估模块45、数据存储管理模块46，存储器41及处理器42中存储并执行基于聚类中心表征的申请前专利质量评估方法程序；The second aspect of the present invention also provides a pre-application patent quality assessment system 4 based on cluster center representation, the system comprising: a memory 41, a processor 42, a user interaction module 43, an assessment data set generation module 44, a quality assessment module 45, and a data storage management module 46, wherein the memory 41 and the processor 42 store and execute a pre-application patent quality assessment method program based on cluster center representation;

用户交互模块43，用于用户输入关键词组，确定评估的专利数据子集；以及输入待预测专利信息的信息，作为评估输入窗口；并将系统评估后的结果返回，为用户显示评估结果；The user interaction module 43 is used for the user to input a keyword group to determine the patent data subset to be evaluated; and to input information of the patent information to be predicted as an evaluation input window; and to return the result of the system evaluation and display the evaluation result to the user;

评估数据集生成模块44，根据用户提供的关键词组，基于专利大数据集，生成子数据集；An evaluation data set generation module 44 generates a sub-data set based on the patent big data set according to the keyword group provided by the user;

质量评估模块45，负责基于待评估专利信息和子数据集进行质量评估；The quality assessment module 45 is responsible for performing quality assessment based on the patent information to be assessed and the sub-datasets;

数据存储管理模块46，负责专利大数据集的存储，以及基于用户关键词组生成的专利子集的存储，便于在非实时评估任务的运行。The data storage management module 46 is responsible for the storage of large patent data sets and the storage of patent subsets generated based on user keyword groups, which facilitates the operation of non-real-time evaluation tasks.

本发明第三方面还提供一种计算机可读存储介质，所述计算机可读存储介质中包括基于聚类中心表征的申请前专利质量评估方法程序，所述基于聚类中心表征的申请前专利质量评估方法程序被处理器执行时，实现如上述任一项所述的基于聚类中心表征的申请前专利质量评估方法的步骤。The third aspect of the present invention also provides a computer-readable storage medium, which includes a pre-application patent quality assessment method program based on cluster center representation. When the pre-application patent quality assessment method program based on cluster center representation is executed by a processor, the steps of the pre-application patent quality assessment method based on cluster center representation as described in any one of the above items are implemented.

在本申请所提供的几个实施例中，应该理解到，所揭露的设备和方法，可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，如：多个单元或组件可以结合，或可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口，设备或单元的间接耦合或通信连接，可以是电性的、机械的或其它形式的。In the several embodiments provided in the present application, it should be understood that the disclosed devices and methods can be implemented in other ways. The device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as: multiple units or components can be combined, or can be integrated into another system, or some features can be ignored, or not executed. In addition, the coupling, direct coupling, or communication connection between the components shown or discussed can be through some interfaces, and the indirect coupling or communication connection of the devices or units can be electrical, mechanical or other forms.

上述作为分离部件说明的单元可以是、或也可以不是物理上分开的，作为单元显示的部件可以是、或也可以不是物理单元；既可以位于一个地方，也可以分布到多个网络单元上；可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed on multiple network units; some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.

另外，在本发明各实施例中的各功能单元可以全部集成在一个处理单元中，也可以是各单元分别单独作为一个单元，也可以两个或两个以上单元集成在一个单元中；上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, all functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above-mentioned integrated units may be implemented in the form of hardware or in the form of hardware plus software functional units.

本领域普通技术人员可以理解：实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成，前述的程序可以存储于计算机可读取存储介质中，该程序在执行时，执行包括上述方法实施例的步骤；而前述的存储介质包括：移动存储设备、只读存储器（ROM，Read-Only Memory）、随机存取存储器（RAM，Random Access Memory）、磁碟或者光盘等各种可以存储程序代码的介质。A person skilled in the art can understand that: all or part of the steps of implementing the above method embodiment can be completed by hardware related to program instructions, and the aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps of the above method embodiment; and the aforementioned storage medium includes: mobile storage devices, read-only memories (ROM, Read-Only Memory), random access memories (RAM, Random Access Memory), disks or optical disks, etc. Various media that can store program codes.

或者，本发明上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备（可以是个人计算机、服务器、或者网络设备等）执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括：移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the above-mentioned integrated unit of the present invention is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the embodiment of the present invention can be essentially or partly reflected in the form of a software product that contributes to the prior art. The computer software product is stored in a storage medium and includes several instructions for a computer device (which can be a personal computer, server, or network device, etc.) to execute all or part of the methods described in each embodiment of the present invention. The aforementioned storage medium includes: various media that can store program codes, such as mobile storage devices, ROM, RAM, disks or optical disks.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应以所述权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any person skilled in the art who is familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed by the present invention, which should be included in the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

Translated fromChinese

1.一种基于聚类中心表征的申请前专利质量评估方法，其特征在于1. A method for evaluating patent quality before application based on cluster center representation, characterized in that

利用约束信息训练专利质量评估模型，通过所述专利质量评估模型对用户输入的专利获得多维质量评价结果。The constraint information is used to train a patent quality assessment model, and a multi-dimensional quality evaluation result is obtained for the patent input by the user through the patent quality assessment model.

2.根据权利要求1所述的一种基于聚类中心表征的申请前专利质量评估方法，其特征在于，基于用户输入的专利文本提取关键词进行检索，在专利大数据中生成特征相似符合预设标准的子数据集，具体为：2. A method for pre-application patent quality assessment based on cluster center representation according to claim 1, characterized in that keywords are extracted from patent text input by the user for retrieval, and a sub-dataset with similar features and meeting preset standards is generated in the patent big data, specifically:

获取注意力编码后的嵌入向量序列及邻域嵌入向量序列，计算序列中嵌入向量及邻域嵌入向量的相似度，获取相似度符合预设相似度阈值的拼接嵌入向量进行解码，作为关键词的抽取结果；Obtain the embedded vector sequence and neighborhood embedded vector sequence after attention encoding, calculate the similarity of the embedded vector and the neighborhood embedded vector in the sequence, obtain the concatenated embedded vector whose similarity meets the preset similarity threshold for decoding, and use it as the keyword extraction result;

3.根据权利要求1所述的一种基于聚类中心表征的申请前专利质量评估方法，其特征在于，通过聚类模型生成所述子数据集的中心表征，具体为：3. The method for pre-application patent quality assessment based on cluster center representation according to claim 1 is characterized in that the center representation of the sub-dataset is generated by a clustering model, specifically:

4.根据权利要求1所述的一种基于聚类中心表征的申请前专利质量评估方法，其特征在于，在用户输入的专利文本中截取待预测专利信息，生成所述待预测专利信息的文本表征，具体为：4. A method for pre-application patent quality assessment based on cluster center representation according to claim 1, characterized in that the patent information to be predicted is intercepted from the patent text input by the user to generate a text representation of the patent information to be predicted, specifically:

5.根据权利要求1所述的一种基于聚类中心表征的申请前专利质量评估方法，其特征在于，基于所述相似度结合专利质量指标生成约束信息，具体为：5. A method for pre-application patent quality assessment based on cluster center representation according to claim 1, characterized in that constraint information is generated based on the similarity combined with patent quality indicators, specifically:

6.根据权利要求1所述的一种基于聚类中心表征的申请前专利质量评估方法，其特征在于，利用约束信息训练专利质量评估模型，具体为：6. The method for pre-application patent quality assessment based on cluster center representation according to claim 1 is characterized in that the patent quality assessment model is trained using constraint information, specifically:

7.一种基于聚类中心表征的申请前专利质量评估系统，其特征在于，该系统包括：存储器、处理器、用户交互模块、评估数据集生成模块、质量评估模块、数据存储管理模块，存储器及处理器中存储并执行基于聚类中心表征的申请前专利质量评估方法程序；7. A pre-application patent quality assessment system based on cluster center representation, characterized in that the system comprises: a memory, a processor, a user interaction module, an assessment data set generation module, a quality assessment module, and a data storage management module, wherein the memory and the processor store and execute a pre-application patent quality assessment method program based on cluster center representation;

8.根据权利要求7所述的一种基于聚类中心表征的申请前专利质量评估系统，其特征在于，在评估数据集生成模块生成所述子数据集的中心表征，具体为：8. A pre-application patent quality assessment system based on cluster center representation according to claim 7, characterized in that the center representation of the sub-dataset is generated in the assessment data set generation module, specifically:

9.根据权利要求7所述的一种基于聚类中心表征的申请前专利质量评估系统，其特征在于，在质量评估模块中获取专利质量评估模型的约束信息，具体为：9. A pre-application patent quality assessment system based on cluster center representation according to claim 7, characterized in that the constraint information of the patent quality assessment model is obtained in the quality assessment module, specifically:

10.根据权利要求7所述的一种基于聚类中心表征的申请前专利质量评估系统，其特征在于，所述质量评估模块中的专利质量评估模型，具体为：10. A pre-application patent quality assessment system based on cluster center representation according to claim 7, characterized in that the patent quality assessment model in the quality assessment module is specifically:

将指标特征与加权后的表征进行全连接，通过多层感知机交互输出向量，将输出向量转换为概率分布得到预测评价，利用MSE评价指标进行评分，获取待预测专利信息的质量评估结果。The indicator features are fully connected with the weighted representation, and the output vector is converted into a probability distribution through the interaction of the multi-layer perceptron to obtain the prediction evaluation. The MSE evaluation index is used for scoring to obtain the quality assessment result of the patent information to be predicted.