Flow platform monitoring method and system based on word frequency weightTechnical Field
The application relates to the field of network multimedia, in particular to a method and a system for monitoring a flow platform based on word frequency weights.
Background
The problem faced by the existing flow platform is that the vocabulary is fragmented, the key vocabulary is difficult to extract, and although the filtering method based on the centroid vector exists in the prior art, when the occurrence frequency of the vocabulary is disordered, the filtering method based on the centroid vector is difficult to achieve the expected effect.
Therefore, a method and system for targeted word frequency weight-based flow platform monitoring are urgently needed.
Disclosure of Invention
The invention aims to provide a flow platform monitoring method and a system based on word frequency weight, which are characterized in that a cloud computing platform is built to acquire internet data flow, a weight value is given according to the occurrence frequency of word components by using syntactic analysis and semantic analysis feature vectors, a cosine value is calculated to obtain a centroid vector of related comments, and alarm judgment is carried out on the centroid vector, so that whether compliance is judged more easily, and the protection efficiency is greatly improved.
In a first aspect, the present application provides a method for monitoring a flow platform based on word frequency weights, where the method includes:
Building a cloud computing platform on a server, and building a syntax model and a semantic analysis model, wherein the syntax model and the semantic analysis model are respectively positioned on different core entities of the cloud computing platform, and the verification body is an entity server in a central position in the cloud computing platform;
According to the acquisition strategy, acquiring a data stream of an Internet platform, inputting feature vectors in the data stream into a syntactic model for sentence breaking, and removing expression symbols to obtain word components;
counting the occurrence times of the word components in unit time, and correspondingly giving weight values according to the times;
Inputting the word components into a semantic analysis model, outputting word meanings, namely sentences with words of a large class, which are simple, unique in meaning and removed, re-forming the word meanings into new sentences, inserting the weight values into the new sentences, and completing vectorization to obtain a second feature vector;
wherein the second feature vector comprises a plurality of weight values corresponding to different word meanings;
calculating cosine values of included angles among a plurality of second feature vectors, and forming a centroid vector from the second feature vectors with the cosine values higher than a threshold value;
calculating an accumulated value of weight values of the centroid vector, wherein the accumulated value is used for reflecting the measure of the relevance of comments;
Filtering word meanings with centroid vector values lower than a second threshold value, judging whether the word meanings comprise appointed keywords, if yes, continuing to judge whether sentences in which the word meanings are located form appointed meanings, if the sentences form the appointed meanings, confirming that corresponding second feature vectors belong to conditions needing alarming, sending alarm information, and if the sentences do not form the appointed meanings, confirming that the corresponding second feature vectors are compliant.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the method further includes risk assessment, attack association analysis, and situation awareness.
With reference to the first aspect, in a second possible implementation manner of the first aspect, the acquiring the data stream of the internet platform includes encoding and decoding the data stream.
With reference to the first aspect, in a third possible implementation manner of the first aspect, the kernels of the semantic analysis model and the syntax model use a neural network model.
In a second aspect, the present application provides a flow platform monitoring system based on word frequency weights, the system comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
The processor is configured to perform the method according to any one of the four possible aspects of the first aspect according to instructions in the program code.
In a third aspect, the present application provides a computer readable storage medium for storing program code for performing the method of any one of the four possibilities of the first aspect.
The invention provides a flow platform monitoring method and a system based on word frequency weight, which are characterized in that a cloud computing platform is built, an internet data stream is acquired, a weight value is given according to the occurrence frequency of word components by using syntactic analysis and semantic analysis feature vectors, a cosine value is calculated to obtain a centroid vector of related comments, and the centroid vector is subjected to alarm judgment, so that whether compliance is judged more easily, and the protection efficiency is greatly improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, thereby making clear and defining the scope of the present invention.
Fig. 1 is a flowchart of a flow platform monitoring method based on word frequency weight provided by the application, which comprises the following steps:
Building a cloud computing platform on a server, and building a syntax model and a semantic analysis model, wherein the syntax model and the semantic analysis model are respectively positioned on different core entities of the cloud computing platform, and the verification body is an entity server in a central position in the cloud computing platform;
According to the acquisition strategy, acquiring a data stream of an Internet platform, inputting feature vectors in the data stream into a syntactic model for sentence breaking, and removing expression symbols to obtain word components;
counting the occurrence times of the word components in unit time, and correspondingly giving weight values according to the times;
Inputting the word components into a semantic analysis model, outputting word meanings, namely sentences with words of a large class, which are simple, unique in meaning and removed, re-forming the word meanings into new sentences, inserting the weight values into the new sentences, and completing vectorization to obtain a second feature vector;
wherein the second feature vector comprises a plurality of weight values corresponding to different word meanings;
calculating cosine values of included angles among a plurality of second feature vectors, and forming a centroid vector from the second feature vectors with the cosine values higher than a threshold value;
calculating an accumulated value of weight values of the centroid vector, wherein the accumulated value is used for reflecting the measure of the relevance of comments;
Filtering word meanings with centroid vector values lower than a second threshold value, judging whether the word meanings comprise appointed keywords, if yes, continuing to judge whether sentences in which the word meanings are located form appointed meanings, if the sentences form the appointed meanings, confirming that corresponding second feature vectors belong to conditions needing alarming, sending alarm information, and if the sentences do not form the appointed meanings, confirming that the corresponding second feature vectors are compliant.
The cloud computing platform further comprises an entity server for calling the edge position, the corresponding word components and the cluster structure are traced, the suspected track and the suspected source point are sent to the entity server of the center position, the entity server of the center position calls the computing capacity of the cloud computing platform, the source point of the corresponding data stream is determined, and the entity server of the edge position is informed of shielding the source point.
In some preferred embodiments, the method further comprises risk assessment, attack association analysis, and situational awareness.
In some preferred embodiments, the acquiring the data stream of the internet platform includes encoding and decoding the data stream.
In some preferred embodiments, the kernels of the semantic analysis model and the syntactic model both use neural network models.
The application provides a flow platform monitoring system based on word frequency weight, which comprises a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the method according to any of the embodiments of the first aspect according to instructions in the program code.
The present application provides a computer readable storage medium for storing program code for performing the method of any one of the embodiments of the first aspect.
In a specific implementation, the present invention also provides a computer storage medium, where the computer storage medium may store a program, where the program may include some or all of the steps in the various embodiments of the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in essence or a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.
The same or similar parts between the various embodiments of the present description are referred to each other. In particular, for the embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference should be made to the description of the method embodiments for the matters.
The embodiments of the present invention described above do not limit the scope of the present invention.