CN113420112B

Movatterモバイル変換

Info

Publication number: CN113420112B
Application number: CN202110685518.3A
Authority: CN
Inventors: 周军; 张震; 杨家豪; 沈亮; 张鹏远; 王立强; 颜永红
Original assignee: Institute of Acoustics CAS; National Computer Network and Information Security Management Center
Current assignee: Institute of Acoustics CAS; National Computer Network and Information Security Management Center
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2025-02-18
Anticipated expiration: 2041-06-21
Also published as: CN113420112A

Abstract

Translated fromChinese

本发明涉及一种基于无监督学习的新闻实体分析方法及装置。方法包括：对待处理的多条新闻数据中的每条新闻数据分别进行分词处理，将分词处理后的每条新闻中包含的多个实体进行标注以得到标注结果；基于所述标注结果构建分布式表示模型，得到所述多个实体的分布式表示信息，所述分布式表示信息标识为实体向量；根据所述多个实体的分布式表示信息，对所述多个实体进行聚类分析以得到聚类结果。本申请将分布式的思想引入新闻实体的处理当中，通过新闻实体所处位置的上下文来得到实体的分布式表示，通过对实体的聚类分析来得到实体的聚类结果。

The present invention relates to a news entity analysis method and device based on unsupervised learning. The method comprises: performing word segmentation processing on each of the multiple news data to be processed, and labeling the multiple entities contained in each news after the word segmentation processing to obtain labeling results; constructing a distributed representation model based on the labeling results to obtain distributed representation information of the multiple entities, and the distributed representation information is identified as an entity vector; and performing cluster analysis on the multiple entities according to the distributed representation information of the multiple entities to obtain clustering results. The present application introduces the idea of distribution into the processing of news entities, obtains the distributed representation of entities through the context of the location of the news entities, and obtains the clustering results of entities through cluster analysis of the entities.