CN108287824A

Movatterモバイル変換

Info

Publication number: CN108287824A
Application number: CN201810188175.8A
Authority: CN
Inventors: 李勤
Original assignee: Beijing Yunzhisheng Information Technology Co Ltd
Current assignee: Beijing Yunzhisheng Information Technology Co Ltd
Priority date: 2018-03-07
Filing date: 2018-03-07
Publication date: 2018-07-17

Abstract

The present invention relates to a kind of semantic similarity calculation method and devices, wherein method includes：The first sentence of sentence centering and the second sentence are pre-processed respectively, the statistical nature between the first syntax of extraction, the second syntax and the first sentence and the second sentence；Respectively by the first sentence and the second sentence word and part of speech be converted to vector, obtain corresponding fisrt feature matrix and second characteristic matrix；Determine that corresponding first sentence tentatively indicates and the second sentence tentatively indicates according to preset first deep neural network model；The similarity between the first sentence and the second sentence is determined according to preset second deep neural network model；Determine whether the first sentence and the second sentence are similar according to the similarity between the first sentence and the second sentence.With this solution, the statistical nature in word feature, word order feature, phrase feature and sentence level has been merged, the similarity between sentence can be determined more accurately.

Description

Technical field

The present invention relates to semantics recognition technical field more particularly to a kind of semantic similarity calculation method and devices.

Background technology

1) lack to the word order of sentence and portraying for semanteme；

2) a large amount of high accurately synonyms or alignment phrase resource are relied on.

Invention content

A kind of semantic similarity calculation method of offer of the embodiment of the present invention and device more accurately determine sentence to realizeBetween similarity.

According to a first aspect of the embodiments of the present invention, a kind of semantic similarity calculation method is provided, including：

The first sentence of sentence centering and the second sentence are pre-processed respectively, the first sentence of extraction is first correspondingStatistical nature between corresponding second syntax of method, the second sentence and first sentence and second sentence；

Respectively by first sentence and the second sentence word and part of speech be converted to vector, it is special to obtain corresponding firstLevy matrix and second characteristic matrix；

It is determined and is corresponded to according to the fisrt feature matrix, second characteristic matrix and preset first deep neural network modelThe first sentence tentatively indicate and the second sentence tentatively indicate；

It is tentatively indicated according to first sentence, the second sentence tentatively indicates, the corresponding statistical nature of the statistical naturePreset second deep neural network model of vector sum determines the similarity between first sentence and second sentence；

First sentence and described second are determined according to the similarity between first sentence and second sentenceWhether sentence is similar.

In one embodiment, it is described respectively by first sentence and the second sentence word and part of speech be converted toAmount, determines corresponding fisrt feature matrix and second characteristic matrix, including：

The word in first sentence and second sentence is converted to term vector respectively using word2vec, is obtainedThe corresponding first word feature matrix of first sentence and the corresponding second word feature matrix of the second sentence；

The part of speech in first sentence and second sentence is converted to part of speech vector respectively using pos2vec, is obtainedTo the corresponding first part of speech feature matrix of the first sentence and the corresponding second part of speech feature matrix of the second sentence；

Splice the first word feature matrix and the first part of speech feature matrix to obtain the fisrt feature matrix,Splice the second word feature matrix and the second word eigenmatrix to obtain the second characteristic matrix.

In one embodiment, described according to the fisrt feature matrix, second characteristic matrix and preset first depthNeural network model obtains corresponding first sentence and tentatively indicates tentatively to indicate with the second sentence, including：

Respectively using the fisrt feature matrix and the second characteristic matrix as first deep neural network modelInput, obtain corresponding first sentence tentatively indicate and the second sentence tentatively indicate.

In one embodiment, described tentatively indicated according to first sentence, the second sentence tentatively indicates, the statisticsThe corresponding feature vector of feature and preset second deep neural network model determine first sentence and second sentenceBetween similarity, including：

First sentence is tentatively indicated respectively tentatively to indicate to do with second sentence to subtract each other point by point and be multiplied point by pointOperation obtains corresponding geometric distance eigenmatrix and angular distance eigenmatrix；

The statistical nature is encoded into vector, obtains corresponding statistical nature vector；

Statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix are spelledIt connects, obtains splicing result；

Using the splicing result as the input of second deep neural network model, first sentence is calculatedWith the similarity of second sentence.

In one embodiment, described in the similarity according between first sentence and second sentence determinesWhether the first sentence and second sentence are similar, including：

According to a second aspect of the embodiments of the present invention, a kind of Semantic Similarity Measurement device is provided, including：

Processor；

Memory for storing processor-executable instruction；

Wherein, the processor is configured as：

By statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix

Spliced, obtains splicing result；

It should be understood that above general description and following detailed description is only exemplary and explanatory, notIt can the limitation present invention.

Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specificationIt obtains it is clear that understand through the implementation of the invention.The purpose of the present invention and other advantages can be by the explanations writeSpecifically noted structure is realized and is obtained in book, claims and attached drawing.

Below by drawings and examples, technical scheme of the present invention will be described in further detail.

Description of the drawings

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the present inventionExample, and be used to explain the principle of the present invention together with specification.

Fig. 1 is a kind of flow chart of semantic similarity calculation method shown according to an exemplary embodiment.

Fig. 2 is the flow of step S102 in a kind of semantic similarity calculation method shown according to an exemplary embodimentFigure.

Fig. 3 is the flow chart of another semantic similarity calculation method shown according to an exemplary embodiment.

Fig. 4 is the flow of step S104 in a kind of semantic similarity calculation method shown according to an exemplary embodimentFigure.

Fig. 5 is the flow of step S105 in a kind of semantic similarity calculation method shown according to an exemplary embodimentFigure.

Specific implementation mode

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related toWhen attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodimentDescribed in embodiment do not represent and the consistent all embodiments of the present invention.On the contrary, they be only with it is such as appendedThe example of the consistent device and method of some aspects being described in detail in claims, of the invention.

Fig. 1 is a kind of flow chart of semantic similarity calculation method shown according to an exemplary embodiment.The voice phaseIt can be applied in terminal device or server like degree computational methods, which can be mobile phone, and computer, number is extensivelyBroadcast terminal, messaging devices, game console, tablet device, Medical Devices, body-building equipment, the equipment such as personal digital assistant.As shown in Figure 1, the method comprising the steps of S101-S105：

In step S101, the first sentence of sentence centering and the second sentence are pre-processed respectively, extract firstSystem between corresponding first syntax of son, corresponding second syntax of the second sentence and first sentence and second sentenceCount feature；

Wherein, the gram statistics feature of syntax, that is, sentence includes part of speech feature, word order feature, and the similarity between part of speech is specialIt levies, the matching degree feature between sentence, matching degree feature of word etc..

In step s 102, respectively by first sentence and the second sentence word and part of speech be converted to vector, obtainTo corresponding fisrt feature matrix and second characteristic matrix；

In step s 103, according to the fisrt feature matrix, second characteristic matrix and preset first depth nerve netNetwork model determines that corresponding first sentence tentatively indicates and the second sentence tentatively indicates；

In step S104, tentatively indicated according to first sentence, the second sentence tentatively indicates, the statistical nature pairPreset second deep neural network model of statistical nature vector sum answered determine first sentence and second sentence itBetween similarity；

In step S105, described first is determined according to the similarity between first sentence and second sentenceWhether sub and described second sentence is similar.

In this embodiment, according to the word of sentence, part of speech, statistical nature and the first depth nerve net between sentence pairNetwork model determines the space length and COS distance between sentence, and then according to the space length and COS distance between sentenceThe similarity between sentence is determined, in this way, the statistics merged in word feature, word order feature, phrase feature and sentence level is specialSign, can be determined more accurately the similarity between sentence.

As shown in Fig. 2, in one embodiment, above-mentioned steps S102 includes step S201-S203：

Word in first sentence and second sentence is converted to word by step S201 respectively using word2vecVector obtains the corresponding first word feature matrix of the first sentence and the corresponding second word feature matrix of the second sentence；

Part of speech in first sentence and second sentence is converted to word by step S202 respectively using pos2vecProperty vector, obtain the corresponding first part of speech feature matrix of the first sentence and the corresponding second part of speech feature matrix of the second sentence；

Step S203 splices the first word feature matrix and the first part of speech feature matrix to obtain described firstEigenmatrix splices the second word feature matrix and the second word eigenmatrix to obtain the second characteristic matrix.

In this embodiment, respectively by sentence word and part of speech be characterized as vector, and then obtain the corresponding spy of sentenceMatrix is levied, subsequently to determine the similarity between sentence according to eigenmatrix.

As shown in figure 3, in one embodiment, above-mentioned steps S103 includes step S301：

Step S301, respectively using the fisrt feature matrix and the second characteristic matrix as first depth nerveThe input of network model obtains corresponding first sentence and tentatively indicates tentatively to indicate with the second sentence.

In this embodiment, using fisrt feature matrix as the input of the first deep neural network model, first is obtainedSub preliminary expression, using second characteristic matrix as the input of the first deep neural network model, obtains the second sentence and tentatively indicates,Tentatively to indicate to determine the similarity between sentence subsequently according to sentence.

As shown in figure 4, in one embodiment, above-mentioned steps S104 includes step S401- steps S404：

Step S401, first sentence is tentatively indicated respectively and second sentence tentatively indicate to do it is point-by-point subtract each other andPoint-by-point multiplication operation obtains corresponding geometric distance eigenmatrix and angular distance eigenmatrix；

If input is two sentences of A, B, their preliminary expression is denoted as R_AAnd R_B, then the geometric distance table between themBe shown as dist (| R_A-R_B|), angular distance is expressed as angle (R_A⊙R_B)。

The statistical nature is encoded into vector by step S402, obtains corresponding statistical nature vector；

Step S403, by statistical nature vector, the geometric distance eigenmatrix and the angular distance feature squareBattle array is spliced, and splicing result is obtained；

Institute is calculated using the splicing result as the input of second deep neural network model in step S404State the similarity of the first sentence and second sentence.

In this embodiment, by statistical nature vector, geometric distance eigenmatrix and the angular distance feature between sentenceInput of the matrix as the second deep neural network model, to sentence it is final expression convert, and then obtain two sentences itBetween similar probability value, i.e. similarity between sentence, in this way, having merged word feature, word order feature, phrase feature and Sentence-levelThe similarity between sentence can be determined more accurately in statistical nature on not.

As shown in figure 5, in one embodiment, above-mentioned steps S105 includes step S501-S502：

Step S501, when the similarity between first sentence and second sentence is more than default similarity, reallyFixed first sentence is similar with second sentence；

Step S502, when the similarity between first sentence and second sentence is similar less than or equal to defaultWhen spending, determine that second sentence and second sentence are dissimilar.

In this embodiment it is possible to default similarity is arranged, such as 80%, then the similarity between two sentences be more thanDetermine that two sentences are similar when 80%, otherwise, it determines two sentence dissmilarities.

Following is apparatus of the present invention embodiment, can be used for executing the method for the present invention embodiment.

Processor；

Memory for storing processor-executable instruction；

Wherein, the processor is configured as：

It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer programProduct.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present inventionApply the form of example.Moreover, the present invention can be used in one or more wherein include computer usable program code computerThe shape for the computer program product implemented in usable storage medium (including but not limited to magnetic disk storage and optical memory etc.)Formula.

The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program productFigure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagramThe combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be providedInstruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produceA raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for realThe device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.

These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spyDetermine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring toEnable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram orThe function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device so that countSeries of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer orThe instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram oneThe step of function of being specified in a box or multiple boxes.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the artGod and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologiesWithin, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of semantic similarity calculation method, which is characterized in that including：

The first sentence of sentence centering and the second sentence are pre-processed respectively, corresponding first syntax of the first sentence of extraction,Statistical nature between corresponding second syntax of second sentence and first sentence and second sentence；

Respectively by first sentence and the second sentence word and part of speech be converted to vector, obtain corresponding fisrt feature squareBattle array and second characteristic matrix；

Corresponding is determined according to the fisrt feature matrix, second characteristic matrix and preset first deep neural network modelOne sentence tentatively indicates and the second sentence tentatively indicates；

It is tentatively indicated according to first sentence, the second sentence tentatively indicates, the corresponding statistical nature vector of the statistical natureAnd preset second deep neural network model determines the similarity between first sentence and second sentence；

First sentence and second sentence are determined according to the similarity between first sentence and second sentenceIt is whether similar.

The word in first sentence and second sentence is converted to term vector respectively using word2vec, obtains firstThe corresponding first word feature matrix of sentence and the corresponding second word feature matrix of the second sentence；

The part of speech in first sentence and second sentence is converted to part of speech vector respectively using pos2vec, obtains theThe corresponding first part of speech feature matrix of one sentence and the corresponding second part of speech feature matrix of the second sentence；

Splice the first word feature matrix and the first part of speech feature matrix to obtain the fisrt feature matrix, by instituteIt states the second word feature matrix and the second word eigenmatrix splices to obtain the second characteristic matrix.

Respectively using the fisrt feature matrix and the second characteristic matrix as the defeated of first deep neural network modelEnter, obtains corresponding first sentence and tentatively indicate tentatively to indicate with the second sentence.

First sentence is tentatively indicated respectively and second sentence tentatively indicate to do it is point-by-point subtract each other with point-by-point multiplication operation,Obtain corresponding geometric distance eigenmatrix and angular distance eigenmatrix；

Statistical nature vector, the geometric distance eigenmatrix and the angular distance eigenmatrix are spliced, obtainedTo splicing result；

Using the splicing result as the input of second deep neural network model, first sentence and institute is calculatedState the similarity of the second sentence.

6. a kind of Semantic Similarity Measurement device, which is characterized in that including：

Processor；

Memory for storing processor-executable instruction；

Wherein, the processor is configured as：

10. the Semantic Similarity Measurement device according to any one of claim 6 to 9, which is characterized in that described according to instituteIt states the similarity between the first sentence and second sentence and determines whether first sentence and second sentence are similar, wrapIt includes：