CN117556814B

Movatterモバイル変換

Info

Publication number: CN117556814B
Application number: CN202310921083.7A
Authority: CN
Inventors: 拥措; 尹宗鹤; 尼玛扎西; 拉毛杰; 万玛才旦
Original assignee: Tibet University
Current assignee: Tibet University
Priority date: 2023-07-26
Filing date: 2023-07-26
Publication date: 2024-12-06
Anticipated expiration: 2043-07-26
Also published as: CN117556814A

Abstract

Translated fromChinese

本发明提出了一种藏文分词与词性标注一体化方法及系统，涉及电子信息领域。本发明提供的方法，通过获取用户输入藏文文本信息，调用一体化模型并切分藏文音节以及非藏文字符块，进行CRF预测，得到最优的标签预测，根据标签预测的结果整理各藏文音节的书写形式得到对应的标注结果。本申请中通过提出了建立对应的一体化模型，从而基于相应的一体化网络进行藏文分词和词性标注，与以往的方案相比，由于本申请中引入的一体化模型的整体性，因此避免了由于领域分词和词性标注任务按照两阶段执行，使得分词错误导致词性标注错误累加的问题，能更准确处理对应的藏文分词和词性标注，进一步增加了方案的实用性。

The present invention proposes a method and system for integrating Tibetan word segmentation and part-of-speech tagging, and relates to the field of electronic information. The method provided by the present invention obtains Tibetan text information input by a user, calls an integrated model and segments Tibetan syllables and non-Tibetan character blocks, performs CRF prediction, obtains the optimal label prediction, and sorts the writing form of each Tibetan syllable according to the result of the label prediction to obtain the corresponding tagging result. In this application, by proposing to establish a corresponding integrated model, Tibetan word segmentation and part-of-speech tagging are performed based on the corresponding integrated network. Compared with previous solutions, due to the integrity of the integrated model introduced in this application, the problem of cumulative part-of-speech tagging errors caused by word segmentation errors due to the two-stage execution of domain word segmentation and part-of-speech tagging tasks is avoided, and the corresponding Tibetan word segmentation and part-of-speech tagging can be processed more accurately, further increasing the practicality of the solution.