CN111125317A

Movatterモバイル変換

Info

Publication number: CN111125317A
Application number: CN201911374381.9A
Authority: CN
Inventors: 杨森; 罗超; 胡泓; 王思博
Original assignee: Ctrip Computer Technology Shanghai Co Ltd
Current assignee: Ctrip Computer Technology Shanghai Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-05-08

Abstract

The invention discloses model training, classification, a system, equipment and a medium for conversational text classification, which comprise the following steps: carrying out violation annotation on training data to obtain first data, wherein the first data comprises original dialogue sentences; segmenting the first data into a plurality of segments; matching the word segmentation with a pre-trained word vector to obtain a first word segmentation vector; matching the first word segmentation vector with a pre-trained word vector to obtain a second word segmentation vector; and inputting the second segmentation vector into a HAN model for training. The model training method for classifying the conversational texts classifies the conversational texts by adopting a hierarchical attention HAN algorithm, solves the problems of relation between lost sentences and semantic loss, and improves the accuracy of classification results; meanwhile, word vector and word vector representation are introduced, so that the influence caused by OOV is reduced, and the accuracy of classification results and the accuracy of models are improved.

Description

Model training, classification, system, device and medium for conversational text classification

Technical Field

The present invention relates to the field of text classification, and more particularly to model training, classification, systems, devices and media for conversational text classification.

Background

The current model training method aiming at text classification mainly comprises two types, one type is that all sentences in a document are spliced together and then text classification modeling is carried out based on CNN (convolutional neural network), RNN (cyclic neural network) or BERT (bidirectional encoder from Transformers). The other model training method is to limit the number of sentences in the document and then to perform classification by splicing, and compared with the former method, although the text length is reduced, the loss of partial sentences is likely to lose semantics, so that the final classification result is not high in precision.

Disclosure of Invention

The invention aims to overcome the defects that the accuracy of a classification result is low due to the fact that the relation between sentences is lost by sentence splicing and the number of sentences in a text is limited and semantics are lost in the prior art, and provides a model training, classification, system, equipment and medium for conversational text classification.

The invention solves the technical problems through the following technical scheme:

a model training method for conversational text classification comprises the following steps:

carrying out violation annotation on training data to obtain first data, wherein the first data comprises original dialogue sentences;

segmenting the first data into a plurality of segments;

matching the word segmentation with a pre-trained word vector to obtain a first word segmentation vector;

matching the first word segmentation vector with a pre-trained word vector to obtain a second word segmentation vector;

the second participle vector is input to a HAN (a hierarchical attention algorithm) model for training.

Preferably, the step of marking the training data for violation to obtain the first data specifically includes:

dividing the training data into a plurality of first sample data by using a clustering algorithm;

extracting a plurality of pieces of data from each piece of the first sample data as second sample data;

calculating the edit distance between any piece of the second sample data and the marked training data;

and if the editing distance is larger than a preset value, carrying out violation marking on the first sample data to obtain first data.

Preferably, the step of segmenting the first data into a plurality of segments specifically includes:

preprocessing the first data to obtain second data;

comparing the second data with preset model data to filter meaningless sentences in the second data to obtain third data;

segmenting the third data into a plurality of segments.

Preferably, the step of inputting the second segmentation vector into the HAN model for training specifically includes:

and inputting the second word segmentation vector and a preset feature vector into the HAN model.

Preferably, the model training method further comprises:

judging whether the model output result meets a preset threshold value, if so, taking the model output result as a classification result; if not, labeling the training data with the confidence coefficient of the model output result smaller than the preset threshold value again to obtain fourth data;

and re-inputting the fourth data into the HAN model, and returning to the step to judge whether the output result of the model meets a preset threshold value.

A method of classifying conversational text, the method comprising the steps of:

obtaining dialogue type text data, wherein the dialogue type text data comprises original dialogue sentences;

segmenting the conversational text data into a plurality of segments;

and inputting the second segmentation vector into an HAN model trained by adopting any one of the above dialogue type text classification model training methods to obtain a classification result.

Preferably, the step of inputting the second segmentation vector into the HAN model trained by using any one of the above methods for model training for conversational text classification to obtain a classification result specifically includes:

and inputting the second word segmentation vector and a preset feature vector into an HAN model trained by adopting any one of the dialogue type text classification model training methods to obtain a classification result.

Preferably, the pre-trained word vectors include an Elmo (embedded language model) word vector and a Glove (global vector for word representation) word vector;

the pre-trained word vector comprises a Glove word vector.

A model training system for conversational text classification comprising the following modules:

the first data acquisition module is used for carrying out violation marking on training data to obtain first data, and the first data comprises original dialogue sentences;

the first word segmentation acquisition module is used for segmenting the first data into a plurality of words;

the first vector acquisition module is used for matching the word segmentation with a pre-trained word vector to obtain a first word segmentation vector;

the second vector acquisition module is used for matching the first word segmentation vector with a pre-trained word vector to obtain a second word segmentation vector;

and the model input module is used for inputting the second participle vector into the HAN model for training.

Preferably, the first data acquisition module includes:

a first sample acquisition unit for dividing the training data into a plurality of pieces of first sample data using a clustering algorithm;

a second sample obtaining unit, configured to extract a plurality of pieces of data from each piece of the first sample data as second sample data;

the editing distance calculation unit is used for calculating the editing distance between any piece of second sample data and the marked training data;

and the first data acquisition unit is used for carrying out violation marking on the first sample data to obtain first data if the editing distance is larger than a preset value.

Preferably, the first word segmentation obtaining module includes:

the second data acquisition unit is used for preprocessing the first data to obtain second data;

the third data acquisition unit is used for comparing the second data with preset model data and filtering meaningless sentences in the second data to obtain third data;

and the word segmentation acquisition unit is used for segmenting the third data into a plurality of words.

Preferably, the model input module is specifically configured to input the second segmentation vector and a preset feature vector to the HAN model.

Preferably, the model training system further comprises:

a result judging module for judging whether the output result of the model meets a preset threshold value,

the first classification result acquisition module is used for taking the model output result as a classification result when the model output result meets a preset threshold value;

the fourth data labeling module is used for labeling the training data with the confidence coefficient of the model output result smaller than the preset threshold value again to obtain fourth data when the model output result does not meet the preset threshold value;

and the second classification result acquisition module is used for inputting the fourth data to the HAN model again and returning the fourth data to the result judgment module.

A classification system for conversational text, the classification system comprising the following modules:

the system comprises a text data acquisition module, a conversion module and a conversion module, wherein the text data acquisition module is used for acquiring conversation type text data which comprises original conversation sentences;

the second segmentation acquisition module is used for segmenting the dialogue type text data into a plurality of segments;

the first word segmentation vector acquisition module is used for matching the word segmentation with a pre-trained word vector to obtain a first word segmentation vector;

the second word segmentation vector acquisition module is used for matching the first word segmentation vector with a pre-trained character vector to obtain a second word segmentation vector;

and the second classification result acquisition module is used for inputting the second segmentation vector into an HAN model trained by adopting any one of the dialogue type text classification model training systems to obtain a classification result.

Preferably, the second classification result obtaining module is further configured to input the second segmentation vector and a preset feature vector to an HAN model trained by using any one of the above model training systems for conversational text classification, so as to obtain a classification result.

Preferably, the pre-trained word vectors include an Elmo word vector and a Glove word vector;

the pre-trained word vector comprises a Glove word vector.

An electronic device comprising a memory, a processor and a computer program stored on the memory and operable on the processor, the processor implementing any of the above methods of model training for conversational text classification or the methods of classification of conversational text when executing the program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for model training for conversational text classification or the method for classifying conversational text described in any of the above.

The positive progress effects of the invention are as follows:

1. the classification of the dialog text is carried out by adopting a hierarchical attention HAN algorithm, so that the problems of relation between lost sentences and lost semantics are solved, and the accuracy of classification results is improved;

2. meanwhile, word vectors and word vector representations are introduced, so that the influence caused by OOV (Out of Vocalburry) is reduced, and the accuracy of classification results and the accuracy of models are improved;

3. the feature vector information is added, and the accuracy of the classification result and the precision of the model are further increased;

4. and performing data iteration based on an active learning mode, improving the data quality and further optimizing the model.

Drawings

Fig. 1 is a flowchart illustrating a method for training a dialog-type text classification model according to apreferred embodiment 1 of the present invention.

FIG. 2 is a flowchart illustrating a method for training dialog-based text classification according to a preferred embodiment 2 of the present invention.

Fig. 3 is a flowchart illustrating a method for classifying dialog-type texts according to apreferred embodiment 3 of the present invention.

FIG. 4 is a block diagram of a system for training dialog-based text classification models according to anembodiment 4 of the present invention.

FIG. 5 is a block diagram of a system for training dialog-based text classification models according to an embodiment 5 of the present invention.

FIG. 6 is a block diagram of a dialog-based text classification system according to an embodiment 6 of the present invention.

Fig. 7 is a block diagram of an electronic device according to a preferred embodiment 7 of the invention.

Detailed Description

The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.

Example 1

As shown in fig. 1, the method for training a dialog-type text classification model of the present embodiment includes the following steps:

s101, carrying out violation annotation on training data to obtain first data, wherein the first data comprises original dialogue sentences;

s102, segmenting the first data into a plurality of word segments;

s103, matching the word segmentation with the pre-trained word vector to obtain a first word segmentation vector;

specifically, matching is performed from pre-trained word vectors through the word segmentation result of the previous step, wherein the word vectors comprise a glove word vector and an Elmo word vector. The Glove word vector and the Elmo word vector are obtained by collecting a large amount of high-quality text data which are open on the internet and text data in the hotel field, preprocessing the high-quality text data and the text data and then training the preprocessed high-quality text data and the preprocessed high-quality text data based on unsupervised learning, and the purpose of obtaining a universal language model is achieved.

S104, matching the first word segmentation vector with the pre-trained character vector to obtain a second word segmentation vector;

specifically, to reduce the effect of OOV, a word-based feature vector, i.e., a glove word vector, is added. The final result will be a word vector of the word vector concatenated to get a resultant representation from word to vector.

And S105, inputting the second word segmentation vector into the HAN model for training.

In the model training method for dialog type text classification of the embodiment, data containing original dialog sentences are used for model training, the completeness of the sentences is kept, and the relationship of interlocutors is convenient to determine through the context dialog contents; the word vectors and the word vectors are matched at the same time, so that the problem of word ambiguity of different contexts is solved, and the accuracy of the model is improved; the hierarchical attention HAN model is used for carrying out semantic analysis on the context call content, so that the context relationship can be well established, and the model classification precision is higher.

Example 2

As shown in fig. 2, the model training method for dialog-type text classification in this embodiment is a further definition ofembodiment 1, and specifically:

step S101 includes:

s1010, dividing the training data into a plurality of first sample data by using a clustering algorithm;

s1011, extracting a plurality of pieces of data from each piece of first sample data as second sample data;

s1012, calculating the edit distance between any one piece of second sample data and the marked training data;

and S1013, if the editing distance is larger than a preset value, carrying out violation marking on the first sample data to obtain first data.

The labeling of conversational text data requires more manpower than other text-type data. In the model training method for conversational text classification of the embodiment, when training data is obtained, a clustering algorithm is used to divide the data into k parts; randomly extracting n pieces of data from each piece of data; and traversing the rest of data in k groups, and randomly extracting one piece of data from n pieces of data to calculate the editing distance so as to see whether the rest of data is greatly different from other data, wherein if the difference is large, the data is not contained in the data sample and needs to be added into the data sample. The specific clustering algorithm is a Levenshtein (edit distance) algorithm, the preset confidence coefficient is 0.7 or 0.8, and when the calculated edit distance is smaller than the preset confidence coefficient, the data is supplemented to the training sample, so that the sufficiency of the training sample data is ensured, and the manual labeling cost is reduced.

Step S102 includes:

s1021, preprocessing the first data to obtain second data;

s1022, comparing the second data with preset model data, and filtering meaningless sentences in the second data to obtain third data;

specifically, the first data is subjected to full angle to half angle, traditional Chinese to simplified Chinese and upper case to lower case operation, and the purpose of the operation is to unify the format of the text content; then, the text with unified format is subjected to word deactivation, such as character removal or low-frequency word removal, null value filtration and sensitive word filtration, so as to reduce the text length and reduce nonsense words to obtain second data; and calculating the similarity of the second data and model data prepared in advance, and further filtering out partial meaningless sentences in the conversation, such as hello, goodbye and the like, wherein the model data is a self-defined meaningless word template. Data preprocessing is performed before word segmentation, and the word segmentation accuracy is further improved.

And S1023, dividing the third data into a plurality of word segments.

In the training method for the conversational text classification model of the embodiment, a HanLP (language processor) word segmentation device is adopted, and in order to increase the word segmentation accuracy and coverage of the scene, a user-defined field dictionary such as big double beds and pre-authorization is added in the word segmentation device.

Step S105 specifically includes:

and inputting the second word segmentation vector and the preset feature vector into the HAN model for training.

Specifically, the HAN model is used to weight each word in each sentence to obtain a sentence vector representation of the whole sentence, and then the vector representation of the whole document is obtained by weighting and summing each sentence under the document. To enrich the features, we add additional feature dimensions to the re-classification layer, such as call duration and order status, to further improve model accuracy.

The method for training the dialog text classification model of the embodiment further comprises the following steps:

s1061, judging whether the output result of the model meets a preset threshold value, if so, executing a step S1062: taking the output result of the model as a classification result;

specifically, the preset threshold is set as the number of iterations or the accuracy of the model or there is no more data with a confidence below 0.9.

If not, go to step S1063: labeling the training data with the confidence coefficient of the model output result smaller than the preset threshold value again to obtain fourth data;

specifically, because the artificially labeled data has certain errors, the active learning idea is used to re-label the data of which the confidence coefficient of the model classification result is lower than 0.9, and the artificially re-labeling is specifically adopted.

S1064, inputting the fourth data into the HAN model again, and returning to the step S1601;

specifically, according to the model classification result, the classification with the highest confidence coefficient is selected as the final classification, and the specific classification is divided into two classes with or without violation.

The model training method for conversational text classification in the embodiment adopts a clustering algorithm to preprocess training data, so that the sufficiency of training sample data is ensured; data preprocessing is carried out before word segmentation, so that the word segmentation accuracy is further improved; a user-defined field dictionary is added, and the word segmentation accuracy and coverage of the scene are increased; the feature vector information is further increased, and the accuracy of the classification result and the accuracy of the model are further increased.

Example 3

As shown in fig. 3, the method for classifying dialog texts of the present embodiment includes the following steps:

s301, obtaining dialogue type text data, wherein the dialogue type text data comprises original dialogue sentences;

s302, segmenting the conversational text data into a plurality of participles;

s303, matching the participles with pre-trained word vectors to obtain first participle vectors;

s304, matching the first word segmentation vector with the pre-trained word vector to obtain a second word segmentation vector;

and S305, inputting the second segmentation vector into the HAN model trained by the dialogue-type text classification model training method of theembodiment 1 or 2 to obtain a classification result.

Step S305 is to input the second segmentation vector and the preset feature vector into the HAN model trained by the dialog-type text classification model training method ofembodiment 1 or 2, so as to obtain a classification result.

The pre-trained word vectors include an Elmo word vector and a Glove word vector;

the pre-trained word vectors include Glove word vectors.

The classification method of the conversational text of the embodiment classifies the transcribed conversation data by adopting the trained HAN model based on the hierarchical attention, introduces factors which have large influence on result judgment, further enriches the characteristic information and increases the precision of the model; selecting a class with a large prediction probability value as a final classification result, so that the classification precision of the model is higher; compared with manual spot check, the recall rate is improved, and the precision is ensured; not only the purpose of violation detection is achieved, but also a large amount of manpower is saved.

Example 4

Referring to fig. 4, the model training system for conversational text classification in this embodiment includes the following modules:

the firstdata acquisition module 1 is used for carrying out violation annotation on training data to obtain first data, and the first data comprises original dialogue sentences;

the first segmentation obtaining module 2 is used for segmenting the first data into a plurality of segments;

The firstvector acquisition module 3 is used for matching the word segmentation with the pre-trained word vector to obtain a first word segmentation vector;

the secondvector acquisition module 4 is used for matching the first word segmentation vector with the pre-trained word vector to obtain a second word segmentation vector;

And the model input module 5 is used for inputting the second participle vector into the HAN model for training.

The dialogue-type text classification model training system of the embodiment performs model training by using data containing original dialogue sentences, so that the completeness of the sentences is kept, and the relation of interlocutors is conveniently determined through the context dialogue content; the word vectors and the word vectors are matched at the same time, so that the problem of word ambiguity of different contexts is solved, and the accuracy of the model is improved; the hierarchical attention HAN model is used for carrying out semantic analysis on the context call content, so that the context relationship can be well established, and the model classification precision is higher.

Example 5

As shown in fig. 5, the model training system for conversational text classification of this embodiment is a further definition ofembodiment 4, specifically:

the firstdata obtaining module 1 specifically includes:

a first sample acquiring unit 11, configured to divide the training data into a plurality of pieces of first sample data by using a clustering algorithm;

a secondsample obtaining unit 12, configured to extract a plurality of pieces of data from each piece of the first sample data as second sample data;

an editdistance calculation unit 13, configured to calculate an edit distance between any one piece of second sample data and the labeled training data;

the firstdata obtaining unit 14 is configured to, if the edit distance is greater than the preset value, perform violation marking on the first sample data to obtain first data.

The labeling of conversational text data requires more manpower than other text-type data. In the model training method for conversational text classification of the embodiment, when training data is obtained, a clustering algorithm is used to divide the data into k parts; randomly extracting n pieces of data from each piece of data; and traversing the rest of data in k groups, and randomly extracting one piece of data from n pieces of data to calculate the editing distance so as to see whether the rest of data is greatly different from other data, wherein if the difference is large, the data is not contained in the data sample and needs to be added into the data sample. The specific clustering algorithm is a Levenshtein algorithm, the preset confidence coefficient is 0.7 or 0.8, and when the calculated editing distance is smaller than the preset confidence coefficient, the data is supplemented to the training sample, so that the sufficiency of the training sample data is ensured, and the manual labeling cost is reduced.

The first segmentation obtaining module 2 specifically includes:

a second data obtaining unit 21, configured to pre-process the first data to obtain second data;

the thirddata acquisition unit 22 is configured to compare the second data with preset model data and filter meaningless sentences in the second data to obtain third data;

A wordsegmentation obtaining unit 23, configured to segment the third data into a plurality of words.

In the dialogue type text classification model training method, a HanLP word segmentation device is adopted, and in order to increase the word segmentation accuracy and coverage of the scene, a user-defined field dictionary such as big double beds and pre-authorization is added in the word segmentation device.

And the model input module 5 is specifically configured to input the second segmentation vector and the preset feature vector to the HAN model.

The model training system for conversational text classification of the present embodiment further includes the following modules:

the result judging module 6 is used for judging whether the output result of the model meets a preset threshold value or not;

The first classification result acquisition module 7 is used for taking the model output result as a classification result when the model output result meets a preset threshold value;

the fourthdata labeling module 8 is configured to label the training data with the confidence coefficient of the model output result being smaller than the preset threshold again to obtain fourth data when the model output result does not meet the preset threshold;

And the second classificationresult acquisition module 9 is configured to re-input the fourth data into the HAN model, and return the fourth data to the result judgment module.

Example 6

As shown in fig. 6, the classification system of dialog-type text of this embodiment adopts the model training system for classifying dialog-type text in any ofembodiments 4 and 5, and the classification system of dialog-type text of this embodiment includes the following modules:

a textdata obtaining module 60, configured to obtain dialogue-type text data, where the dialogue-type text data includes an original dialogue statement;

a secondsegmentation obtaining module 61, configured to segment the conversational text data into a plurality of segments;

a first segmentationvector obtaining module 62, configured to match the segmentation with a pre-trained word vector to obtain a first segmentation vector;

a second segmentationvector obtaining module 63, configured to match the first segmentation vector with the pre-trained word vector to obtain a second segmentation vector;

and a second classificationresult obtaining module 64, configured to input the second segmentation vector to the pre-trained HAN model to obtain a classification result.

The second classificationresult obtaining module 64 is specifically configured to input the second segmentation vector and the preset feature vector to the HAN model trained by the dialog text classification model training system inembodiment 4 or 5, so as to obtain a classification result.

the pre-trained word vectors include Glove word vectors.

The classification system of the conversational text of the embodiment classifies the transcribed call data by adopting the trained HAN model based on the hierarchical attention, introduces factors which have large influence on result judgment, further enriches the characteristic information and increases the precision of the model; selecting a class with a large prediction probability value as a final classification result, so that the classification precision of the model is higher; compared with manual spot check, the recall rate is improved, and the precision is ensured; not only the purpose of violation detection is achieved, but also a large amount of manpower is saved.

Example 7

Fig. 7 is a schematic block diagram of an electronic device according to embodiment 7 of the present invention. The electronic device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the model training method for conversational text classification or the classification method for conversational text of 3 provided inembodiments 1 or 2 when executing the program. Theelectronic device 30 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 7, theelectronic device 30 may be embodied in the form of a general purpose computing device, which may be, for example, a server device. The components of theelectronic device 30 may include, but are not limited to: the at least oneprocessor 31, the at least onememory 32, and abus 33 connecting the various system components (including thememory 32 and the processor 31).

Thebus 33 includes a data bus, an address bus, and a control bus.

Thememory 32 may include volatile memory, such as Random Access Memory (RAM)321 and/orcache memory 322, and may further include Read Only Memory (ROM) 323.

Memory 32 may also include a program/utility 325 having a set (at least one) ofprogram modules 324,such program modules 324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Theprocessor 31 executes various functional applications and data processing, such as the model training method for conversational text classification provided inembodiment 1 or 2 of the present invention or the classification method for conversational text provided inembodiment 3, by running a computer program stored in thememory 32.

Theelectronic device 30 may also communicate with one or more external devices 34 (e.g., keyboard, pointing device, etc.). Such communication may be through input/output (I/O) interfaces 35. Also, model-generatingdevice 30 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) vianetwork adapter 36. As shown,network adapter 36 communicates with the other modules of model-generatingdevice 30 viabus 33. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the model-generatingdevice 30, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Example 8

The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the model training method for conversational text classification provided inembodiment 1 or 2 or the classification method steps for conversational text provided in 3.

More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible embodiment, the invention can also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of a model training method for dialog-type text classification or the classification method for dialog-type text of 3, which method is provided inembodiment 1 or 2, when the program product is run on the terminal device.

Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims

1. A method for training a model for conversational text classification, comprising the steps of:

segmenting the first data into a plurality of segments;

and inputting the second segmentation vector into a HAN model for training.

2. The method for training models for conversational text classification according to claim 1, wherein the step of labeling training data for violations to obtain first data specifically comprises:

3. The method of model training for conversational text classification according to claim 1, wherein the step of segmenting the first data into a plurality of segments specifically comprises:

preprocessing the first data to obtain second data;

segmenting the third data into a plurality of segments.

4. The method of model training for conversational text classification according to claim 1, wherein the step of inputting the second segmentation vector into the HAN model for training specifically comprises:

and inputting the second word segmentation vector and a preset feature vector into an HAN model for training.

5. The method of model training for conversational text classification as recited in claim 1, further comprising:

and inputting the fourth data into the HAN model again for training, and returning to the step to judge whether the output result of the model meets a preset threshold value.

6. A method for classifying conversational text, the method comprising the steps of:

segmenting the conversational text data into a plurality of segments;

inputting the second segmentation vector into the HAN model trained by the dialog type text classification model training method according to any one of claims 1-5 to obtain a classification result.

7. The method for classifying conversational text according to claim 6, wherein the step of inputting the second segmentation vector into the HAN model trained by the method for model training of conversational text classification according to any one of claims 1-5 to obtain the classification result specifically comprises:

inputting the second segmentation vectors and preset feature vectors into the HAN model trained by the conversational text classification model training method according to any one of claims 1-5, and obtaining classification results.

8. The method for classifying conversational text according to claim 6, wherein the pre-trained word vectors comprise Elmo word vectors and Glove word vectors;

the pre-trained word vector comprises a Glove word vector.

9. A conversational text classification model training system, comprising the following modules:

10. The system for model training of conversational text classification of claim 9, wherein the first data acquisition module comprises:

11. The system for model training of conversational text classification of claim 9, wherein the first segmentation acquisition module comprises:

12. The system of model training for conversational text classification according to claim 9, wherein the model input module is specifically configured to input the second segmentation vector and preset feature vectors into a HAN model.

13. The conversational text classification model training system of claim 9, further comprising:

14. A classification system for conversational text, the classification system comprising the modules:

a second classification result obtaining module, configured to input the second segmentation vector to an HAN model trained by using the dialogue-type text classification model training system according to any one of claims 9 to 13, so as to obtain a classification result.

15. The system for classifying conversational text according to claim 14, wherein the second classification result obtaining module is further configured to input the second segmentation vector and preset feature vectors into the HAN model trained by the model training system for conversational text classification according to any one of claims 9-13, so as to obtain the classification result.

16. The system for classification of conversational text according to claim 14, wherein the pre-trained word vectors comprise Elmo word vectors and Glove word vectors;

the pre-trained word vector comprises a Glove word vector.

17. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of model training for conversational text classification of any of claims 1-5 or the method of classification of conversational text of any of claims 6-8 when the program is executed.

18. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method for model training for conversational text classification of any of claims 1-5 or the method for classification of conversational text of any of claims 6-8.