CN119719366A

Movatterモバイル変換

Info

Publication number: CN119719366A
Application number: CN202411539107.3A
Authority: CN
Inventors: 张玉臣; 胡浩; 汪永伟; 范钰丹; 刘鹏程; 周洪伟; 纪然
Original assignee: Information Engineering University Of Chinese People's Liberation Army Cyberspace Force
Current assignee: Information Engineering University Of Chinese People's Liberation Army Cyberspace Force
Priority date: 2024-10-31
Filing date: 2024-10-31
Publication date: 2025-03-28

Abstract

The invention relates to the technical field of natural language processing, in particular to a target personnel confidentiality consciousness assessment method and system based on multiple emotion analysis, which comprises the steps of constructing a confidentiality field text corpus and preprocessing data of the confidentiality field text corpus; the emotion analysis research model is constructed, and is trained by taking text word vectors as sample data, and comprises a BERT layer for mining text vectors based on text context semantics and text sentence front-back sequence relations, a BiLSTM network for carrying out bidirectional text feature extraction on the text vectors, and a fully-connected output layer for classifying and outputting text features, wherein the emotion analysis research model is used for predicting security consciousness emotion tendency types of target groups, and is used for visual display. According to the method, the security consciousness of the target person is researched and judged by mining the implicit emotion tendencies in the related texts of the security consciousness of the target person, and effective assistance can be provided for information security management.

Description

Target personnel confidentiality consciousness assessment method and system based on multivariate emotion analysis

Technical Field

The invention relates to the technical field of natural language processing, in particular to a target personnel confidentiality consciousness assessment method and system based on multi-element emotion analysis.

Background

The key of the security work is to raise the security consciousness of the personnel, while the traditional security consciousness research and judgment method is in a qualitative stage, and often adopts a questionnaire method, a scale method, subjective evaluation and other methods, lacks a scientific and reasonable evaluation model and a quantitative evaluation system, and is difficult to objectively and specifically judge the security consciousness of the personnel.

Disclosure of Invention

Therefore, the invention provides a target personnel security consciousness assessment method and system based on multi-element emotion analysis, which is used for mining emotion tendencies implicit in related texts of the target personnel security consciousness based on emotion analysis and text preprocessing so as to further judge the security consciousness of the target personnel and provide effective assistance for enterprise and public information security management.

According to the design scheme provided by the invention, on one hand, a target personnel privacy awareness assessment method based on multiple emotion analysis is provided, which comprises the following steps:

Constructing a text corpus in the security domain, preprocessing data of the text corpus in the security domain to optimize text data in the corpus and convert texts into text word vectors, wherein the text word vectors are obtained by accumulating text word static vectors, text word position vectors and text sentence vectors;

Constructing an emotion analysis and judgment model, and training the emotion analysis and judgment model by using a text word vector as sample data, wherein the emotion analysis and judgment model comprises a BERT layer for mining the text vector based on text context semantics and a text sentence front-back sequential relationship, a BiLSTM network for extracting bidirectional text features from the text vector, and a fully-connected output layer for classifying and outputting the text features;

And collecting relevant texts of the privacy topics of the target crowd, predicting the privacy consciousness emotion tendency type of the target crowd by using the trained emotion analysis and judgment model, and visually displaying the prediction result.

As the target personnel security consciousness assessment method based on the multi-element emotion analysis, the invention further constructs a security field text corpus, comprising the following steps:

Collecting a secret topic related text, wherein the secret topic related text comprises a secret topic related social network comment, a secret topic related news and a secret topic self-built data set;

labeling the privacy consciousness emotion tendency type of the text related to the privacy topic, and identifying the entity in the text related to the privacy topic through analysis;

And replacing the expanded secret topic related text data by the entity, the near meaning word and the anti-meaning word, and constructing a secret topic related text based on the expanded secret topic related text data.

As the target personnel security consciousness assessment method based on the multi-element emotion analysis, the invention further comprises the steps of replacing related text data of the extended security topics by entities, near words and anti-words, and comprising the following steps:

searching for related replacement entities in response entity categories of the related texts of the confidential topics, and generating new text data;

The part of speech analysis is carried out on the related text of the secret topic, the related replacement part of speech in the related text of the secret topic is searched, the close meaning word and/or the anti-meaning word of the related replacement part of speech are used for replacement, and a corresponding new labeling is generated, so that new text data are generated.

As the target personnel security consciousness assessment method based on the multi-element emotion analysis, the method further carries out data preprocessing on a security field text corpus, and comprises the following steps:

filtering texts in a corpus, filtering and removing invalid texts, and removing redundant characters in text data, wherein the invalid texts comprise texts with empty related fields and texts irrelevant to secret topics, and the redundant characters comprise missing values, repeated values and emoticons;

Cleaning the text data by using a regular expression matching method to remove meaningless characters in the text data;

Splitting a long text and a large file exceeding a threshold value in text data into a short text and a small file with specified parts through data blocking;

Performing word segmentation on the text data by utilizing Wordpiece, masking word segmentation results by utilizing masks, adding a classification token mark at the beginning of each word segmentation sequence, and utilizing the last layer output corresponding to the classification token to represent the whole word segmentation sequence information, inserting the segmentation token mark between sentences in the same word segmentation sequence, and adding embedded vector information indicating the position of each token for each token;

and converting the text word segmentation sequences with different lengths into word vectors with standard lengths.

The method for evaluating the security consciousness of the target person based on the multi-element emotion analysis further comprises a full connection layer for converting the multi-dimensional feature vector into the appointed low-dimensional feature vector and an output layer for classifying the low-dimensional feature vector according to emotion degree and type and determining the security consciousness emotion tendency category, wherein the output layer adopts a softmax function as a classifier, calculates the probability value corresponding to each security consciousness emotion tendency category by using the softmax function and selects the category with the largest probability value as the final output emotion tendency category.

As the target personnel security consciousness assessment method based on the multi-element emotion analysis, the invention further uses the text word vector as sample data to train an emotion analysis and judgment model, and comprises the following steps:

Dividing the text word vector into a training sample set and a test sample set according to a specified proportion;

And performing iterative training on the emotion analysis and judgment model based on a preset model loss function by utilizing text word vectors in a training sample set, and performing performance evaluation, adjustment and optimization on the trained emotion analysis and judgment model by utilizing a test sample set so as to obtain the emotion analysis and judgment model with the model training effect meeting the expected requirement.

As the target personnel security consciousness assessment method based on the multi-element emotion analysis, the method further carries out visual display on the prediction result and comprises the following steps:

And carrying out statistical analysis and visual display on the security consciousness emotion tendency category prediction results of the plurality of text data reactions by using a visual display platform, wherein the visual display platform adopts a B/S architecture, and the statistical analysis comprises statistical analysis of text structures and statistical analysis of the security consciousness emotion tendency categories.

In still another aspect, the invention also provides a target personnel security consciousness assessment system based on multi-element emotion analysis, which comprises a corpus construction module, a model training module and a target studying and judging module, wherein,

The corpus construction module is used for constructing a text corpus in the security field, preprocessing data of the text corpus in the security field to optimize text data in the corpus and convert texts into text word vectors, wherein the text word vectors are obtained by accumulating text word static vectors, text word position vectors and text sentence vectors;

The model training module is used for constructing an emotion analysis and judgment model and training the emotion analysis and judgment model by taking a text word vector as sample data, wherein the emotion analysis and judgment model comprises a BERT layer for mining the text vector based on text context semantics and a text sentence front-back sequence relationship, a BiLSTM network for extracting bidirectional text features of the text vector and a fully-connected output layer for classifying and outputting the text features;

The target research and judgment module is used for collecting the related texts of the privacy topics of the target crowd, predicting the privacy consciousness emotion tendency type of the target crowd by utilizing the trained emotion analysis and judgment model, and visually displaying the prediction result.

The invention has the beneficial effects that:

according to the invention, text data including comments, opinion suggestions and the like are collected from the inside and the outside of an enterprise organization, multiple emotion analysis is performed on the basis of data processing, emotion tendencies of the personnel in the aspect of security consciousness are measured by using a BERT-BiLSTM model, and the personnel security consciousness level and possible risks and problems are excavated, so that the integral security consciousness level of the relevant personnel is judged, and assistance is provided for targeted improvement of further information security management work. And further, experimental data show that compared with the traditional emotion analysis model FastText, the BERT-BiLSTM model in the scheme has higher accuracy and practicability in text emotion analysis processing under a secret subject, has certain advantages compared with other models, and can provide effective assistance for secret work.

Drawings

FIG. 1 is a schematic diagram of a target personnel security consciousness assessment flow based on multiple emotion analysis in an embodiment;

FIG. 2 is a text data expansion flow diagram in an embodiment;

FIG. 3 is a schematic diagram of an emotion analysis and judgment model in an embodiment;

FIG. 4 is a schematic diagram of BERT layer structure in the embodiment;

FIG. 5 is a schematic diagram of a transducer structure in an embodiment;

FIG. 6 is a diagram of a BiLSTM network architecture in an embodiment;

FIG. 7 is a schematic illustration of a secret awareness assessment algorithm in an embodiment;

FIG. 8 is a visual presentation of emotion analysis of a plurality of text data in an embodiment;

fig. 9 is an example of experimental dataset fragments in an embodiment.

Detailed Description

The present invention will be described in further detail with reference to the drawings and the technical scheme, in order to make the objects, technical schemes and advantages of the present invention more apparent.

The key of the security work is to raise the security consciousness of the personnel, while the traditional security consciousness research and judgment method is in a qualitative stage, often adopts a questionnaire method and a scale method, lacks a scientific evaluation model and a quantitative evaluation system, and is difficult to objectively judge the security consciousness of the personnel. Therefore, the embodiment of the invention provides a target personnel privacy awareness assessment method based on multi-element emotion analysis, which specifically comprises the following steps:

Referring to fig. 1, a process for realizing visual analysis from original text collection to final result includes a text collection stage, obtaining social network comments aiming at secret topics, secret topic related news and self-built data set text, performing comprehensive processing to form a secret field text corpus, and taking the text in the corpus as input of a data preprocessing stage. The data preprocessing stage can be generally divided into two tasks, namely optimizing text data in a language library through invalid text rejection and data cleaning, so as to improve the subsequent processing efficiency and accuracy, and converting the text into a special word vectorization form with labels which can be directly input as a next layer through word segmentation, filling, word embedding and vectorization. In the emotion analysis and judgment stage, vector representation containing context text semantic information is obtained through a BERT pre-training model, an output [ CLS ] is selected to be used as an input end of a BiLSTM network model for learning, the output is transmitted to an output layer in a visualization stage after passing through a full-connection layer, interaction is carried out in a visualization part, and visual display of a classification task result is completed.

Wherein, the construction of the text corpus in the security domain can be designed to comprise:

In the embodiment of the scheme, since the data information about the confidentiality aspect in the network is less, a part of text data is derived from news texts obtained from social network comment texts, public numbers and media, and on the other hand, the text data is derived from self-built confidentiality topic text data, and in order to adopt more data, the expansion of the data is realized through near-anticonsite word conversion and entity conversion respectively.

Specifically, by replacing the extended privacy topic related text data with entities, paraphrasing and anticonyming words, it can be designed to contain:

As shown in fig. 2, the method comprises the steps of performing entity recognition and part-of-speech analysis on the result of the original tagged text word segmentation, searching for a replaced entity in the entity category of the response and generating a new text, replacing different parts-of-speech by searching for a near meaning word and an anti-meaning word, generating a new text tag by matching with the original tag, and further approving the new tagged text data by utilizing a manual checking mode. Table 1 gives an example of corpus replacement.

Table 1 corpus replacement example

Original corpus	Derived corpus
		I even though the company is slightly sensitive and does not want to see	I even though the company does not want to see with a little privacy
These prescriptions in the unit feel good	These prescribed sensations in the unit are also poor

The data preprocessing of the text corpus in the security domain can be designed to include:

The data preprocessing stage mainly comprises six parts of invalid text rejection, data cleaning, sentence word segmentation, filling, word embedding and vectorization. The invalid text rejection refers to screening texts in a corpus extracted in the previous stage, filtering and rejecting texts with empty related fields and texts irrelevant to secret topics, and meanwhile, removing missing values, repeated values and expression symbols, avoiding problems caused by program operation, influencing training and testing results of a model, further optimizing data internal structures, and improving model processing efficiency and accuracy. The data cleaning mainly adopts a regular expression matching method, and aims to remove HTML tag characters, illegal unicode characters, nonsensical characters such as </SUB >, and the like in the crawled data so as to optimize the data format and remove nonsensical characters. The sentence word segmentation part comprises two steps, namely, data are segmented, long texts and large files are split into short texts and small files and are divided according to a certain number of parts, and a certain improvement effect can be achieved on the subsequent training data generation process. The word embedding part aims at dividing and converting a single sentence into token representation which is used as the canonical input of the BERT model. Most of the semantics in Chinese are expressed by words, so that a WWM mechanism, which is Chinese vocabulary information, needs to be introduced for a Chinese MLM task. The method comprises the steps of firstly, carrying out word segmentation by Wordpiece, then masking the result by using a mask, and then adding a specific classification token mark ([ CLS ]) at the beginning of each sequence, wherein the last layer output corresponding to the classification token is used for representing the whole sequence information. Meanwhile, the same sequence inserts a split token mark ([ SEP ]) between sentences, and adds embedded vector information indicating the position of each token to each token so as to distinguish different sentences. An example of token characterization is given in Table 2.

TABLE 2BERT word embedding Structure

The filling part and the vectorization part convert the data texts with different lengths into vector matrix formats with standard lengths, so that the BERT layer can process the data texts conveniently.

The main task of emotion analysis technology is to perform emotion analysis on a text to identify emotion information in the text. The text emotion classification model constructed herein is shown in fig. 3, and has 4 layers, namely a BERT layer, a Bi-LSTM layer, a full connection layer and an output layer in sequence. In the BERT preprocessing layer, the feature representation of the text vector is acquired, and then the acquired feature representation is input into the BiLSTM layer to extract the emotion feature of the text. The BERT model well compensates for the defect that BiLSTM cannot pay attention to text context information, and finally the classifier classifies the extracted features.

In this embodiment, a BERT pre-training model is selected for text pre-processing to obtain a comprehensive word vector representation, and the BERT structure is shown in fig. 4. In the BERT model, the input vector is accumulated from 3 different vectors. The first is a Word static vector (Token Embedding) which can be obtained by Word2vec technique, the second is a position vector (Positional Embedding) for embedding and retaining the relative position or absolute position information of the Word in the corpus, and the third is a sentence vector (Segment Embeddings), that is to say the input is a sentence, so only one sentence vector is used and a [ CLS ] flag vector is added to each sentence for reflecting the information of the whole sentence and a sentence end flag vector [ SEP ] for dividing two sentences in the text in the following work.

The BERT model mainly consists of an encoder-decoder structure of a bidirectional converter, as shown in fig. 5, a self-attention mechanism is a main technology of the BERT encoder, BERT depends on matrix operation, input vectors are spliced into vector matrixes E, e= { E₁,e₂…e_n }, and the vector matrixes are delivered to the converter, and the output of the vector matrixes are shown in the formulas (1) and (2).

Q=EW^Q,K＝EW^K,V＝EW^V (1)

In which the Softmax function represents normalizationEach row vector after the operation is used for calculating the importance of each word to other words, Q represents a query matrix, K, V represents a word vector matrix, a penalty factorIs to prevent QK^T has an excessive inner product. d_K represents the vector dimension. WQ, WK, and WV each represent a linear transformation matrix.

After computing the output of the self-attention mechanism, a multi-headed attention mechanism output, denoted X, may be obtained.

The output vector matrix of the last layer in the BERT model is marked as T= { T₁,t₂…t_n }, the dimension of T is the same as the input matrix E, and each dimension is a word segmentation vector used for representing depth and is used as Bi-LSTM network input.

The bi-directional encoder structure therein illustrates that when a model processes a certain vocabulary, it can describe some semantics in other vocabularies in the context through semantic relations of the context, while the BERT masks sentences and words with masks, so that the model learns sequential feature information of sentences based on predictions of next sentences at the input level.

In this embodiment, a BERT chinese pretraining model "BERT-base" of google open source may be used, where the pretraining model uses a 12-layer transform network structure, which contains 12 multi-head attentives. The dimension of the output vector is 768 dimensions, the maximum length is 128, and the deficiency can be filled and filled. The basic structure of the transducer consists of a multi-head self-attention mechanism and a full-connection feedforward network, the data firstly passes through the multi-head attention layer, the weighted feature vector is acquired, the data is sent to the full-connection feedforward network layer, the feature is extracted through a bidirectional encoder, and finally, the word-level vectorization representation of the text is output and is used as the input of BiLSTM and is transmitted to BiLSTM for the subsequent training task.

BiLSTM emotion feature extraction layer, LSTM replaces a node of a conventional RNN model with a special structure (cell). BiLSTM is a model of forward LSTM and backward LSTM superimposed on each other. The model is better used for bi-directional semantic capture, consisting of two LSTM inversions. By superimposing the forward LSTM with the backward LSTM, both forward semantics in the text and reverse semantics information can be obtained, and the output is from the two LSTM joint decision states, as shown in FIG. 6.

After the vector is input by the input layer, the bidirectional LSTM model respectively performs forward and backward calculation, and as shown in the following formula, the updating formula of the LSTM from front to back is as follows:

the back-to-front formula is:

The output formula after superposition is:

Where y₂ is the output of Bi-LSTM after n times, W is the weight of the network, H is the bias, and H is the number of hidden units.

In the embodiment, the number of hidden units in the first layer is 128, the number of hidden units in the second layer is 96, the output dimension of the first layer of the full-connection layer is 32 dimensions, and the output dimension of the second layer is 2 dimensions. In training the model, the model parameters are updated using the back propagation mechanism using cross entropy as a loss function.

The Bi-LSTM network outputs a sequence containing a Bi-directional hidden state, and the function of the full connection layer is to transform the multi-dimensional feature vector into a low-dimensional feature vector, and transmit the low-dimensional feature vector to the output layer. The main task of the output layer is to classify the text according to the degree and type of emotion, so as to evaluate the security consciousness in the text. Firstly, receiving data transmitted by a Bi-LSTM layer subjected to full-connection layer dimension reduction processing, adopting a Softmax function as a classifier, carrying out normalization processing on the feature vector, calculating to obtain a probability value corresponding to each emotion type, and finally selecting the type with the largest probability value as a final output result of a model to determine the emotion type. In the embodiment, the texts are divided into three categories, namely higher security consciousness, general security consciousness and lower security consciousness. The specific classification method is as follows, text is classified into an active category, a general category and a passive category through model training results. The text is further classified according to the particular vocabulary (e.g., "confidential," "secret," etc.) that appears in the text.

The training of the emotion analysis and judgment model by using the text word vector as sample data can be designed to comprise:

The task of the visualization technology is to perform certain statistical analysis and visual display on the evaluation result so as to facilitate the user to view and understand. The visual display platform adopts a B/S architecture, the server is composed of emotion analysis servers, the front end is based on JavaScript design, data visualization is realized by adopting Echart, input text structures such as average sentence length distribution, keyword frequency and the like are mainly displayed, meanwhile, on the basis of further data analysis on security consciousness research and judgment classification results, information duty pie charts, bar charts and the like with different security consciousness degrees are provided, and the security consciousness level of personnel is intuitively displayed.

Further, based on the method, the embodiment of the invention also provides a target personnel security consciousness assessment system based on the multi-element emotion analysis, which comprises a corpus construction module, a model training module and a target studying and judging module, wherein,

As shown in fig. 7, the whole algorithm flow is divided into four parts of text extraction, model training, analysis and judgment and result display. And text extraction is carried out on the text in the field of the specific secret topics by means of data crawling, data construction and the like and by utilizing social network platforms such as microblogs, knowledgeable networks and other network text sources. Dividing the preprocessed text into a training set and a testing set according to a certain proportion, wherein the training set is used for training a BERT-BiLSTM emotion analysis and judgment model, and the testing set is used for model performance evaluation and adjustment optimization. Inputting the preprocessed training data into an emotion analysis and judgment model, sequentially passing through a BERT layer, a BiLSTM layer and a full-connection layer, completing a data preprocessing model in the BERT layer, taking BERT layer output as BiLSTM network input to extract bidirectional text information, and finally repeatedly carrying out parameter iteration according to a loss function to optimize a model structure until the model training effect meets the requirement. The text information in the unlabeled personnel security topic field can be obtained by inputting texts such as security revealing events, security system evaluation and the like into a emotion analysis and judgment model for analysis and judgment, classifying security consciousness through emotion degree and outputting personnel security consciousness level. The result is visualized by Echart, different security consciousness levels of the data reaction are further analyzed, firstly, the structure of the text is analyzed, secondly, the duty ratio of the different security consciousness levels is analyzed, and the like, and the research and judgment result is intuitively presented to the user. And if single data are predicted, obtaining a text classification result to be predicted. As shown in fig. 8, if there are a plurality of pieces of data, the data are further analyzed and processed, and the result is displayed.

To verify the validity of this protocol, the following is further explained in connection with experimental data:

The experiment uses Python to collect related comments, news and the like of topics in fields of confidentiality, disclosure, information security and the like in websites such as voice trembling, fast handholding, learning, microblog, serging, confidentiality and the like, and on the basis, a confidentiality topic corpus is formed through data expansion and self-built data set fusion.

The experiment comprises a plurality of preprocessing steps, including character invalid in the transfer of experimental data, positive and negative labels of emotion (label 0 represents neutral, label 1 represents negative, label 2 represents positive), deleting emoji character, stopping word deletion, normalizing words and splitting data, wherein 1511 data are selected from the preprocessed data as a data set of the experiment, 535 data with label being neutral, 426 data with label being negative, and 350 data with label being positive. The data sets of the different labels are then divided into training sets, verification sets and test sets according to the ratio of about 10:1:1, the training sets 1311, the verification sets 100, the test sets 100, and part of the data is shown in fig. 9. Finally, the test set is used to verify the function of the module. To measure and analyze model performance, some important toolkits to build workflow and some statistical evaluation metrics are used.

All experiments were conducted in the same configuration, using Legion Y7000,7000-1060 notebook computer with i7 processor, 8G memory, and specific software and hardware configuration as shown in table 3:

TABLE 3 experiment operating Environment

For the experiments, some mathematical and statistical tools were used. After the mathematical model is built, the model is implemented by encoding in the Python programming language. Python3.6 is the coding tool and the preferred tool for this experiment. The data was made algorithm compatible using Scikit-learn. Keras is used to assist in building BiLSTM models. Furthermore Keras plays the most critical role in the intersection of BiLSTM with artificial neural networks. The experimental parameters were set up in two parts, the experimental parameters of the BERT model and the experimental parameters of the BiLSTM model, as shown in table 4.

TABLE 4 model parameters

The self-built data set is used for testing, and the test results show that the scheme has higher accuracy and practicability through establishing proper evaluation indexes and comparing the evaluation indexes with other models.

The training results of the model were evaluated mainly using four indices, namely, accuracy A (Accuracy), accuracy P (Precision) Recall R (Recall), and F1 score (F-score), respectively.

The above results are obtained from a confusion matrix (Confusion Matrix) under the classification:

TABLE 5 confusion matrix

TP in the confusion matrix indicates that the original label is positive, namely the label is positive after classification, namely the correct positive sample number is predicted, FN indicates that the original label is negative, but the classification label is positive, namely the negative sample number in the classification error, FP indicates that the original label is positive, namely the classification label is negative, namely the positive sample number in the classification error, TN is negative, namely the negative sample number in the classification correct after classification.

The accuracy refers to the proportion of the correct number of samples to the total number after model classification, wherein the proportion includes positive and negative cases. The formula can be calculated as:

Precision-Recall (Precision-Recall) is typically used simultaneously, precision being for classification and Recall being for model. The accuracy refers to the sample duty ratio in which the true case is positive among samples predicted to be positive.

Recall refers to the positive sample duty cycle of the classifier in samples where the classifier is truly positive.

The F1 score combines the precision rate and the recall rate as an index, which is the harmonic mean value of the precision rate and the recall rate:

In addition, other training models are also set and trained for further analysis of experimental results. Three different training models of TextCNN, LSTM and BERT-BiLSTM are adopted, and the three training models are placed in the same data set to carry out the comparison of emotion analysis tasks, so that the comparison is used for verifying the strong text representation capability of BERT-BilSTM in the scheme.

And secondly, comparing the model constructed in the scheme with other proposed emotion analysis models to verify whether the model constructed in the scheme improves the emotion classification accuracy compared with the model of traditional deep learning.

Experimental training set data 1311, test set, validation set data 100. In the model training process, iter parameters represent iteration times, and Epoch parameters represent training round numbers and are set to 3 rounds. In the change of model training parameter loss values after every 10 iterations, the model effect optimization is achieved 6 times, the star mark is used for training, and the total time is 17 minutes. The same dataset was trained using the classical FastText model as a control experiment, illustrated as model training parameters, wherein Embedding layers were randomly generated, dropout was set to 0.5, the number of training rounds the model took was 20, and the training was shared for 44 minutes.

The results of emotion analysis for the different pre-trained models are shown in table 6.

TABLE 6 statistics of experimental test results

Evaluation index	Accuracy(%)	Recall(%)	F1-score(%)
				FastText	80.63	79.44	79.84
BERT-BiLSTM	95.06	95.56	95.19

On a comparative experiment, fastText and BERT-BiLSTM models were selected for training and result comparison. The result shows that the accuracy of emotion classification of BERT-BiLSTM is improved by 14.43 percent compared with FastText, the recall rate is improved by 16.22 percent, and the F1 fraction is improved by 15.35 percent. Meanwhile, in the optimization process, BERT-BiLSTM only adopts epoch=3 rounds of training, while FastText adopts epoch=20 rounds of training, and although the training speed is faster than that of a model in the scheme under the same round number of FastText, the accuracy is relatively lower. The BERT-BiLSTM makes feature selection by context information of the vocabulary, and dynamically adjusts the vocabulary vectors according to the change of the context information, thereby proving that the BERT pre-training model is more beneficial to the text information extraction of the model from the comprehensive index.

And if new data is set as model input, predicting the emotion classification result. A single piece of data "must be kept secret, without classifying leakage" as 2 (positive). The multiple data lists are { "to announce the matter out" "" the matter must be kept secret, no wind leakage sound can occur "" "the matter must be kept secret, no other person knows" "" information must exist secret leakage risk "" "vigilance secret leakage, secret leakage prevention and secret leakage prevention" } classification result is {1,2,2,1,0}, namely { negative, positive, active, negative and neutral }, and the result shows that the model in the scheme has better prediction capability for new texts.

The experimental data show that the Bert-BiLSTM pre-training two-way neural network model in the scheme has a great improvement on the context compared with the traditional text classification model, and better results can be obtained by carrying out emotion analysis in the security field.

The relative steps, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The elements and method steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or a combination thereof, and the elements and steps of the examples have been generally described in terms of functionality in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those of ordinary skill in the art may implement the described functionality using different methods for each particular application, but such implementation is not considered to be beyond the scope of the present invention.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the methods described above may be implemented by a program that instructs associated hardware, and the program may be stored on a computer readable storage medium such as a read-only memory, a magnetic or optical disk, etc. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits, and accordingly, each module/unit in the above embodiments may be implemented in hardware or may be implemented in a software functional module. The present invention is not limited to any specific form of combination of hardware and software.

It should be noted that the foregoing embodiments are merely illustrative embodiments of the present invention, and not restrictive, and the scope of the invention is not limited to the embodiments, and although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that any modification, variation or substitution of some of the technical features of the embodiments described in the foregoing embodiments may be easily contemplated within the scope of the present invention, and the spirit and scope of the technical solutions of the embodiments do not depart from the spirit and scope of the embodiments of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.