CN113380418A

Movatterモバイル変換

Info

Publication number: CN113380418A
Application number: CN202110690776.0A
Authority: CN
Inventors: 毛科技; 樊鑫奔; 王宇翔; 黄玉娇
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-06-22
Filing date: 2021-06-22
Publication date: 2021-09-10

Abstract

Translated fromChinese

一种通过对话文本分析识别抑郁症的系统，包括依次连接的精神疾病专家问诊录音数据预处理模块、抑郁症对话文本分析识别模型训练模块、用户抑郁症情况识别模块。本发明通过抑郁症对话识别模型，在传统文本分类的基础上使用RoBERTa预训练模型复合Bi‑LSTM模型和注意力机制，让模型以更灵活的方式对文本分块，模型可以确定在任意方向处理的下一个分块，应用循环机制，允许块之间传输信息，从而达到长文本阅读的目的。同时在模型的输出端增加Attention注意力机制，通过区分不同特征的重要程度，忽略不重要的特征，将注意力放在重要的特征上，能更准确地在超长文本中抽取关键词，提高分类准确率，高准确率地识别用户是否患有抑郁症。

A system for recognizing depression through dialogue text analysis includes a data preprocessing module for psychiatric experts' consultation recording data, a depression dialogue text analysis and recognition model training module, and a user depression situation recognition module, which are sequentially connected. Through the depression dialogue recognition model, the invention uses the RoBERTa pre-training model to combine the Bi-LSTM model and the attention mechanism on the basis of traditional text classification, so that the model can divide the text into blocks in a more flexible way, and the model can determine the processing in any direction. The next block of , applies a round-robin mechanism that allows information to be transferred between blocks for the purpose of reading long texts. At the same time, an Attention attention mechanism is added to the output of the model. By distinguishing the importance of different features, ignoring unimportant features, and focusing on important features, keywords can be extracted more accurately in super-long texts, improving Classification accuracy, high accuracy to identify whether the user suffers from depression.

Description

System for analyzing and identifying depression through dialog text

Technical Field

The present invention relates to a system for identifying patients with depression.

Background

In recent years, models based on deep learning have become the mainstream of text classification models, and the main models include RNN, CNN, and the like. Based on these basic depth models, some work has focused on integrating information from different angles into the text classification task and has succeeded. In addition, with the recent exploration and utilization of self-attention mechanism and the development of pre-training model in natural language processing field, the Bert model becomes the most popular and most effective pre-training text classification model at present, and has very excellent performance in text emotion classification field. The language and the dialogue are main data sources for doctors to diagnose and treat mental diseases, and the artificial intelligence technology is applied to the speech semantic analysis of patients, so that the early warning of the mental diseases can be assisted. At present, the automatic speech semantic analysis of clinical interview records of patients has been studied abroad, and the onset condition of mental diseases of young people with high risk factors of mental diseases is predicted within 2.5 years after baseline evaluation. And Facebook assesses user negative emotions, even severe psychological disease tendencies, based on a psycho-robot that learns deep natural language understanding, thereby achieving early identification and early warning of depression.

At present, no related research for intelligent mental disease screening based on open question and answer by using an artificial intelligence technology exists in China. An efficient deep learning model is to be designed, the question and answer data of an expert doctor and a patient are used as samples, and the designed deep learning model is trained and tested to design the deep learning model with high evaluation accuracy.

Disclosure of Invention

The present invention overcomes the above-mentioned shortcomings of the prior art and provides a method for analyzing and identifying depression by dialog text.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a system for analyzing and identifying depression through dialog texts comprises a mental disease expert inquiry record data preprocessing module, a depression dialog text analysis and identification model training module and a user depression condition identification module which are sequentially connected; wherein

The disease expert inquiry recording data preprocessing module specifically comprises: arranging inquiry records of mental disease experts, and intercepting inquiry core fragments according to individuals; converting the inquiry voice fragments into texts by using a Chinese voice-to-text tool on the market; the erroneous content of the corrected speech recognition result is manually checked.

The depression dialogue text analysis recognition model training module specifically comprises:

1) and extracting a text data set obtained by the disease expert interrogation recording data preprocessing module, extracting a diagnosis result of a person to which the text belongs as a label, and corresponding the text and the label to complete the production of a training set, a verification set and a test set.

2) And (3) building a Bi-LSTM + Attention + RoBERTA-wwm-ext-large text classification model, setting the number of neurons of a Bi-LSTM hidden layer to be 256, and outputting results of 0 and 1 (whether depression exists).

3) And constructing a binary cross entropy loss function. The formula of the loss function is shown as (1-1).

4) And (3) inputting the built depression recognition model by using the training set long dialog text and the label as input signals, and training by adopting a RoBERTA-wwm-ext-large Chinese pre-training model to change the dialog text into a 768-dimensional word vector. The word vector is input into the Bi-LSTM layer, and the output vector of each time sequence is input into the Attention layer. The Attention layer calculates the weight of each time sequence, weights all time sequence vectors, takes the result as a characteristic vector, and selects a Softmax function to classify to obtain an output signal, namely whether the depression exists or not.

The user depression condition identification module selects conversation audio to be diagnosed and converts the conversation audio into a conversation text; and loading the depression recognition model stored in the depression dialogue text analysis recognition model training module, and inputting dialogue long text information to obtain a judgment result.

The invention has the following beneficial effects:

(1) identifying with high accuracy whether a user has depression;

(2) the extraction of the key words of the ultra-long dialog text is more accurate, and the recognition effect is greatly improved;

(3) the composite model is superior to the individual models in effect.

Drawings

Fig. 1 is a general flow chart for using the present invention.

Fig. 2 is a structural diagram of a depression recognition model used in the present invention.

Detailed Description

The technical scheme of the invention is further explained by combining the attached drawings.

A method for analyzing and identifying depression through dialog text comprises a mental disease expert inquiry record data preprocessing module, a depression dialog text analysis and identification model training module and a user depression condition identification module which are sequentially connected;

The user depression condition identification module specifically includes: selecting dialogue audio to be diagnosed and converting the dialogue audio into a dialogue text; and loading the depression recognition model stored in the depression dialogue text analysis recognition model training module, and inputting dialogue long text information to obtain a judgment result.

According to the invention, through a depression dialogue recognition model, a RoBERTA pre-training model composite Bi-LSTM model and an attention mechanism are used on the basis of traditional text classification, so that the model blocks the text in a more flexible manner, the model can determine the next block processed in any direction, and a circulation mechanism is applied to allow information to be transmitted between blocks, thereby achieving the purpose of reading the long text. Meanwhile, an Attention mechanism is added at the output end of the model, the importance degree of different features is distinguished, unimportant features are ignored, Attention is paid to the important features, keywords can be extracted from the overlong text more accurately, the classification accuracy is improved, and whether the user suffers from depression or not is identified with high accuracy. Firstly, the inquiry records of the mental disease experts are arranged, inquiry core fragments are intercepted according to individuals, and inquiry voice fragments are converted into texts; then, a Bi-LSTM + Attention + RoBERTA-wwm-ext-large text classification model is established, and an inquiry chief dialog text training model is input; and finally, inputting dialogue long text information by using the depression recognition model to obtain a judgment result.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A system for identifying depression through conversational text analysis, characterized by: the system comprises a mental disease expert inquiry record data preprocessing module, a depression dialogue text analysis and recognition model training module and a user depression condition recognition module which are sequentially connected;

the disease expert inquiry recording data preprocessing module specifically comprises: arranging inquiry records of mental disease experts, and intercepting inquiry core fragments according to individuals; converting the inquiry voice fragments into texts by using a Chinese voice-to-text tool on the market; manually checking and correcting the wrong content of the voice recognition result;

the method comprises the following steps that a Bi-LSTM + Attention + RoBERTA-wwm-ext-large text classification model is set up by a tristimania dialog text analysis and recognition model training module, and an inquiry chief dialog text training model is input, and the method specifically comprises the following steps:

1) extracting a text data set obtained by a disease expert inquiry record data preprocessing module, extracting a diagnosis result of a person to which the text belongs as a label, and corresponding the text and the label to complete the production of a training set, a verification set and a test set;

2) a Bi-LSTM + Attention + RoBERTA-wwm-ext-large text classification model is built, the number of neurons of a Bi-LSTM hidden layer is set to be 256, and output results are 0 and 1 (whether depression exists or not);

3) constructing a binary cross entropy loss function; the formula of the loss function is shown as (1-1);

4) taking a training set long dialog text and a label as input signals, inputting the constructed depression recognition model, training by adopting a RoBERTA-wwm-ext-large Chinese pre-training model, and changing the dialog text into a 768-dimensional word vector; inputting the word vector into the Bi-LSTM layer, and inputting the output vector of each time sequence into the Attention layer; the Attention layer calculates the weight of each time sequence, weights vectors of all time sequences, takes the result as a characteristic vector, and selects a Softmax function to classify to obtain an output signal, namely whether depression exists or not;