Movatterモバイル変換


[0]ホーム

URL:


CN109446376A - Method and system for classifying voice through word segmentation - Google Patents

Method and system for classifying voice through word segmentation
Download PDF

Info

Publication number
CN109446376A
CN109446376ACN201811290932.9ACN201811290932ACN109446376ACN 109446376 ACN109446376 ACN 109446376ACN 201811290932 ACN201811290932 ACN 201811290932ACN 109446376 ACN109446376 ACN 109446376A
Authority
CN
China
Prior art keywords
participle
audio
semantic
speech
corpus sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811290932.9A
Other languages
Chinese (zh)
Other versions
CN109446376B (en
Inventor
魏誉荧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Genius Technology Co Ltd
Original Assignee
Guangdong Genius Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Genius Technology Co LtdfiledCriticalGuangdong Genius Technology Co Ltd
Priority to CN201811290932.9ApriorityCriticalpatent/CN109446376B/en
Publication of CN109446376ApublicationCriticalpatent/CN109446376A/en
Application grantedgrantedCritical
Publication of CN109446376BpublicationCriticalpatent/CN109446376B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

The invention provides a method and a system for classifying voice by word segmentation, wherein the method comprises the following steps: acquiring a corpus sample library, and establishing an audio library and a semantic slot according to a corpus sample in the corpus sample library; acquiring voice audio; comparing the voice audio with the word segmentation audio in the audio library, and generating matched word segmentation audio in the voice audio; merging the same word segmentation audio, and counting the frequency of each merged word segmentation audio in the voice audio; obtaining a word segmentation segment semantic corresponding to the word segmentation segment audio according to the semantic slot; selecting semantics corresponding to one or more semantic sets as classification labels of the voice audio according to the semantic meanings and the frequency of the word segmentation fragments; and classifying the voice audio according to the classification label. The invention can rapidly and accurately classify the content of the voice audio by word segmentation, thereby clearly storing the voice audio and facilitating the subsequent search.

Description

A kind of method and system classified by participle to voice
Technical field
It is espespecially a kind of by segmenting the method classified to voice and being the present invention relates to technical field of voice recognitionSystem.
Background technique
With the fast development of internet, people's lives become more and more intelligent, therefore people are also increasingly accustomed toVarious demands are completed using intelligent terminal in ground.And with increasingly mature, the intelligence of each Terminal Type of artificial intelligence the relevant technologiesChange degree is also higher and higher.AC applications one of of the interactive voice as human-computer interaction mainstream in intelligent terminal, and increasinglyFavor by user.Therefore user will receive a large amount of voice daily.
Valuable voice messaging is thought for user, user can select to be stored.But it selects to store up in userWhen depositing or it is that the approach of selection default and the title of default are stored, voice messaging can't be divided accordinglyClass causes the subsequent voice messaging for finding needs quite cumbersome.It is that need user seriatim be each language to be storedMessage breath stamps tag along sort, is then classified to corresponding classification, and process is complicated, and user may when selecting storageForget or since other accidents are not classified, also can bring certain fiber crops for the subsequent voice messaging needed of findingIt is tired.
Therefore currently it is badly in need of the method that a kind of pair of voice is classified, intelligent classification stores the voice messaging for needing to store,Convenient for the subsequent voice messaging for rapidly and accurately searching out needs.
Summary of the invention
The object of the present invention is to provide a kind of by segmenting the method and system classified to voice, and realization passes through participleRapidly and accurately classify to the content of speech audio, is convenient for the mesh of subsequent searching so as to clearly store voice audio's.
Technical solution provided by the invention is as follows:
The present invention provides a kind of by segmenting the method classified to voice characterized by comprising
Corpus sample database is obtained, audio repository and semantic slot are established according to the corpus sample in the corpus sample database;
Obtain speech audio;
Participle audio in the speech audio and the audio repository is compared, the generation in the speech audioWith the participle clip audio being consistent;
Merge the identical participle clip audio, each after statistics merges segments clip audio in the voice soundThe frequency occurred in frequency;
It is semantic that the corresponding participle segment of the participle clip audio is obtained according to the semantic slot;
According to one or more semantic points as the speech audio of the participle segment semanteme and the frequency selection purposesClass label;
Classified according to the tag along sort to the speech audio.
Further, the acquisition corpus sample database establishes audio according to the corpus sample in the corpus sample databaseLibrary and semantic slot specifically include:
The corpus sample database is obtained, the corpus sample in the corpus sample database is divided according to participle techniqueWord obtains the participle for including in the corpus sample;
The corresponding participle audio of the participle is obtained, audio is established according to the participle audio and the corresponding participleLibrary;
It is semantic to obtain the corresponding participle of the participle, establishes semantic slot with the participle according to the participle is semantic.
Further, the acquisition corpus sample database establishes audio according to the corpus sample in the corpus sample databaseLibrary and semantic slot further include:
Obtain the corresponding corpus sample semanteme of the corpus sample and the corresponding part of speech of the participle;
In conjunction with the corpus sample, semantic, the described semantic and described part of speech of participle parses the sentence knot of the corpus sampleStructure;
If the participle belongs to keyword in the sentence structure, mark the participle for crucial participle.
Further, described according to the one or more semantic set pair of the participle segment semanteme and the frequency selection purposesThe semanteme answered is specifically included as the tag along sort of the speech audio:
Corresponding semantic set is formed according to the participle segment semanteme, merges semantic similarity or the similar semantic collectionIt closes;
In conjunction with the frequency that the participle clip audio occurs in the speech audio, one or more merge is chosenTag along sort of the corresponding semanteme of semantic set afterwards as the speech audio.
Further, the frequency that participle clip audio occurs in the speech audio described in the combination, choosingSemanteme after taking one or more to merge gathers corresponding semanteme as the tag along sort of the speech audio
The corresponding target participle of the participle clip audio is obtained according to the audio repository;
Judge in the target participle whether to include the crucial participle;
The frequency that participle clip audio occurs in the speech audio described in the combination, chooses one or moreSemanteme corresponding to semantic set after a merging is specifically included as the tag along sort of the speech audio:
If in the target participle including the crucial participle, the corresponding semantic set of the crucial participle is chosen;
In conjunction with the frequency that the crucial participle occurs in the speech audio, one or more keys are chosenTag along sort of the corresponding semanteme of the semantic set of participle as the speech audio.
The present invention also provides a kind of by segmenting the system classified to voice characterized by comprising
Database module obtains corpus sample database, establishes audio according to the corpus sample in the corpus sample databaseLibrary and semantic slot;
Voice obtains module, obtains speech audio;
The voice is obtained the speech audio and Database module foundation that module obtains by matching moduleThe audio repository in participle audio compare, the participle clip audio that is consistent of matching is generated in the speech audio;
Processing module merges the participle clip audio that the identical matching module obtains, every after statistics mergingThe frequency that one participle clip audio occurs in the speech audio;
Semanteme obtains module, obtains the matching module according to the semantic slot that the Database module is established and obtainsThe corresponding participle segment of the participle clip audio arrived is semantic;
Analysis module is counted according to the semantic participle segment semanteme for obtaining module acquisition and the processing moduleThe one or more semantic tag along sorts as the speech audio of the frequency selection purposes;
Categorization module classifies to the speech audio according to the tag along sort that the analysis module is chosen.
Further, the Database module specifically includes:
Participle unit obtains the corpus sample database, according to participle technique to the corpus in the corpus sample databaseSample is segmented, and the participle for including in the corpus sample is obtained;
Acquiring unit obtains the corresponding participle audio of the participle that the participle unit obtains;
Database unit, the participle audio and the corresponding participle unit obtained according to the acquiring unitThe obtained participle establishes audio repository;
It is semantic to obtain the corresponding participle of the participle that the participle unit obtains for the acquiring unit;
The Database unit, the participle semanteme and the corresponding participle obtained according to the acquiring unitThe participle that unit obtains establishes semantic slot.
Further, the Database module further include:
The acquiring unit obtains the corresponding corpus sample semanteme of the corpus sample and institute that the participle unit obtainsState the corresponding part of speech of participle;
Resolution unit, the corpus sample obtained in conjunction with the acquiring unit is semantic, the participle is semantic and institute's predicateProperty parses the sentence structure of the corpus sample;
Marking unit is marked if the resolution unit parses the participle and belongs to keyword in the sentence structureRemember the participle for crucial participle.
Further, the analysis module specifically includes:
Combining unit forms corresponding semantic set according to the participle segment semanteme, merges semantic similarity or similarThe semantic set;
Analytical unit chooses one in conjunction with the frequency that the participle clip audio occurs in the speech audioOr tag along sort of the semanteme as the speech audio corresponding to the semantic set after multiple combining units merging.
Further, the analysis module further include:
Target segments acquiring unit, obtains the corresponding target of the participle clip audio according to the audio repository and segments;
Judging unit judges whether in the target participle of the target participle acquiring unit acquisition include the passKey participle;
The analytical unit specifically includes:
Subelement is chosen, if the judging unit judges in the target participle to include the crucial participle, is chosenThe key segments corresponding semantic set;
Analyze subelement, in conjunction with the frequency that the crucial participle occurs in the speech audio, choose one orThe corresponding semanteme of multiple semantic set for choosing the crucial participle that subelement is chosen is as the speech audioTag along sort.
What is provided through the invention is a kind of by segmenting the method and system classified to voice, can bring with down towardIt is few a kind of the utility model has the advantages that
1, in the present invention, corpus sample database is formed by collecting a large amount of corpus sample, then establishes audio repository and semantemeSlot matches convenient for the subsequent speech audio to acquisition, and then obtains the corresponding tag along sort of the speech audio.
2, in the present invention, by merging identical participle clip audio in speech audio, the contingency table of speech audio is reducedThe range of choice of label.
3, in the present invention, speech audio is selected in conjunction with the frequency that each participle clip audio occurs in speech audioTag along sort, it is ensured that selected tag along sort can farthest indicate the intention of the speech audio.
4, in the present invention, the tag along sort of speech audio is chosen by participle intelligence, then the speech audio is dividedClass can fast and accurately find target voice audio so that all speech audios orderly store convenient for subsequent.
Detailed description of the invention
Below by clearly understandable mode, preferred embodiment is described with reference to the drawings, to one kind by participle to voiceAbove-mentioned characteristic, technical characteristic, advantage and its implementation for the method and system classified are further described.
Fig. 1 is a kind of flow chart by segmenting the one embodiment for the method classified to voice of the present invention;
Fig. 2 is a kind of flow chart by segmenting second embodiment of the method classified to voice of the present invention;
Fig. 3, Fig. 4 are a kind of processes by segmenting the third embodiment for the method classified to voice of the present inventionFigure;
Fig. 5 is a kind of structural representation by segmenting the 4th embodiment of the system classified to voice of the present inventionFigure;
Fig. 6 is a kind of structural representation by segmenting the 5th embodiment of the system classified to voice of the present inventionFigure;
Fig. 7 is a kind of structural representation by segmenting the 6th embodiment of the system classified to voice of the present inventionFigure.
Drawing reference numeral explanation:
1000 segment the system classified to voice excessively
1100 Database module, 1110 participle unit, 1120 acquiring unit, 1130 Database unit1140 resolution unit, 1150 marking unit
1200 voices obtain module
1300 matching modules
1400 processing modules
1500 semantic acquisition modules
1600 analysis module, 1610 combining unit, 1620 target segments 1630 judging unit of acquiring unit
1640 analytical units 1641 choose subelement 1642 and analyze subelement
1700 categorization modules
Specific embodiment
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, Detailed description of the invention will be compareed belowA specific embodiment of the invention.It should be evident that drawings in the following description are only some embodiments of the invention, forFor those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings otherAttached drawing, and obtain other embodiments.
To make simplified form, part related to the present invention is only schematically shown in each figure, they are not representedIts practical structures as product.In addition, there is identical structure or function in some figures so that simplified form is easy to understandComponent only symbolically depicts one of those, or has only marked one of those.Herein, "one" is not only indicated" only this ", can also indicate the situation of " more than one ".
The first embodiment of the present invention, as shown in Figure 1, a kind of by segmenting the method classified to voice, comprising:
S100 obtains corpus sample database, establishes audio repository and semantic slot according to the corpus sample in the corpus sample database.
Corpus sample database is established specifically, collecting and obtaining a large amount of corpus sample, all corpus samples is then analyzed, obtainsParticiple in corpus sample and corresponding audio, semanteme etc. out, to establish audio repository, semantic slot.
S200 obtains speech audio.
Specifically, obtaining speech audio, which may be the voice that user inputs in real time, such as use with otherFamily by voice mode exchange during, when user feel wherein certain one or more voice be related to it is valuable, subsequent canThe information that can be needed, needs to save it, checks for the ease of subsequent searching, it is therefore desirable to preservation of being classified.
It in addition is also likely to be the audio of downloading or recording, such as the audio-frequency information amount of recording is larger, user is sufficientTime identifies one by one, therefore in order to rapidly and accurately finding the audio that oneself is needed in a large amount of audios of recording, needs pairAll audios carry out classification processing.
S300 compares the participle audio in the speech audio and the audio repository, raw in the speech audioThe participle clip audio being consistent at matching.
Specifically, the speech audio that will acquire and summarizing the participle in the audio repository obtained according to a large amount of corpus sampleAudio is matched one by one, when certain a part matching knot in a certain participle audio in audio repository and the speech audio gotFruit is the corresponding participle clip audio in the part to be generated in speech audio, to speech audio be split into multiple when being consistentSegment clip audio.
S400 merges the identical participle clip audio, each after statistics merges segments clip audio in institute's predicateThe frequency occurred in sound audio.
Specifically, all participle clip audios split out of identification, identical participle clip audio are merged, thenEach after statistics merges segments the frequency that clip audio occurs in speech audio, and the participle clip audio merged is stillIt is counted according to the quantity before not merging.
Such as 10 participle clip audios are split out in certain speech audio, wherein participle clip audio is " animal " appearance5 times, " what " occurs 3 times, and "Yes" occurs 2 times, after identical participle clip audio is merged, obtains 3 participle piecesSection audio, the frequency that " animal " occurs are 0.5, and the frequency that " what " occurs is 0.3, and the frequency that " animal " occurs is 0.2.
S500 obtains the corresponding participle segment semanteme of the participle clip audio according to the semantic slot.
Specifically, when certain a part of matching result in a certain participle audio in audio repository and the speech audio gotWhen being consistent, the corresponding participle clip audio in the part is generated, corresponding point of the participle clip audio is then obtained according to audio repositoryIt is semantic to get the corresponding participle segment of the participle clip audio further according to the participle and semantic slot for word.
S600 is one or more semantic as the speech audio according to the participle segment semanteme and the frequency selection purposesTag along sort.
Specifically, it is semantic according to the corresponding participle segment of participle clip audio, and combine each participle clip audioThe frequency occurred in speech audio arranges the participle segment semanteme according to the descending sequence of corresponding frequencyColumn are chosen and arrange the semantic tag along sort as the speech audio of forward one or more.
Above-mentioned method of speech classification is the speech audio that system analysis is got, and then by carrying out intelligence to speech audioClassification, but likewise, user can also select the corresponding tag along sort of speech audio according to the understanding of oneself.
S700 classifies to the speech audio according to the tag along sort.
Specifically, the corresponding tag along sort of speech audio is obtained, no matter the tag along sort is what system was intelligently chosen, stillWhat user independently selected, classification storage is carried out to the speech audio got according to the tag along sort, is convenient for subsequent lookup.
In the present embodiment, corpus sample database is formed by collecting a large amount of corpus sample, then establishes audio repository and semantemeSlot matches convenient for the subsequent speech audio to acquisition, and then obtains the corresponding tag along sort of the speech audio.
The present invention chooses the tag along sort of speech audio by participle intelligence, then classifies to the speech audio, makesIt obtains all speech audios orderly to store, can fast and accurately find target voice audio convenient for subsequent.
Wherein, by merging identical participle clip audio in speech audio, the choosing of the tag along sort of speech audio is reducedRange is selected, in addition combined with the contingency table for the frequency selection speech audio that each participle clip audio occurs in speech audioLabel, it is ensured that selected tag along sort can farthest indicate the intention of the speech audio.
The second embodiment of the present invention is the optimal enforcement example of above-mentioned first embodiment, as shown in Figure 2, comprising:
S110 obtains corpus sample database, is segmented according to participle technique to the corpus sample in the corpus sample database,Obtain the participle for including in the corpus sample.
Corpus sample database is established specifically, collecting and obtaining a large amount of corpus sample, corpus sample refers not only to penman text,It further include voice, audio etc., difference is that the corpus sample such as voice, audio needs first to be converted to corresponding text information, thenCarry out subsequent processing.
Corpus sample is segmented according to participle technique, judges the structure of sentence in corpus sample, identifies corpus sampleIn every a word in word part of speech, then by every a word in corpus sample according to the part of speech of word by entire sentenceIt is divided into the participles such as word, word and phrase composition.Therefore the participle for including in corpus sample and corresponding part of speech have been obtained.
S120 obtains the corresponding participle audio of the participle, establishes sound according to the participle audio and the corresponding participleFrequency library.
Specifically, obtaining each segments corresponding audio, it is same due to the influence of the factors such as age of user and accentA participle may correspond to multiple audios, the different audios of the same participles of acquisition more as far as possible, and one time subsequent to identify comprehensivelyUser speech avoids omitting.Then audio repository is established according to all audios, is being established between participle and audio in audio repositoryCorresponding relationship.
S130 obtains the corresponding participle semanteme of the participle, establishes semantic slot with the participle according to the participle is semantic.
Specifically, all participles for including in above-mentioned all corpus samples are obtained, according to all participle and participleCorresponding participle is semantic to establish semantic slot, and in the corresponding relationship established between participle and participle semanteme in semantic slot.
S140 obtains the corresponding corpus sample semanteme of the corpus sample and the corresponding part of speech of the participle.
S150 semantic, described sentence for segmenting the semantic and described part of speech and parsing the corpus sample in conjunction with the corpus sampleStructure.
If the S160 participle belongs to keyword in the sentence structure, mark the participle for crucial participle.
Specifically, obtaining, the corresponding corpus sample of corpus sample is semantic and each segments corresponding part of speech, then tiesClose the sentence structure that the part of speech that corpus sample is semantic, participle is semantic and segments parses the corpus sample.
The part of speech of each participle is first determined whether, if part of speech of certain participle belongs to the word of the not no practical significance such as conjunctionProperty, then the participle does not have much influence corpus sample semanteme, therefore can exclude such participle first.
Next judges the semantic influence to corpus sample semanteme of the participle of each participle, is if certain participle is deleted in judgementIt is no also to be appreciated that corpus sample is semantic, if then indicating that the participle is unimportant, otherwise indicate that the participle is to understand corpus sampleSemantic keyword.It will finally be determined as the mark of word segmentation of keyword for crucial participle.
S200 obtains speech audio.
S300 compares the participle audio in the speech audio and the audio repository, raw in the speech audioThe participle clip audio being consistent at matching.
S400 merges the identical participle clip audio, each after statistics merges segments clip audio in institute's predicateThe frequency occurred in sound audio.
S500 obtains the corresponding participle segment semanteme of the participle clip audio according to the semantic slot.
S600 is one or more semantic as the speech audio according to the participle segment semanteme and the frequency selection purposesTag along sort.
S700 classifies to the speech audio according to the tag along sort.
In the present embodiment, corpus sample is segmented according to participle technique, to establish audio repository and semantic slot, and is tiedIt closes the part of speech that corpus sample is semantic, participle is semantic and segments and parses the sentence structure of the corpus sample to judge crucial pointWord identifies speech audio in order to subsequent, chooses corresponding tag along sort.
The third embodiment of the present invention is the optimal enforcement example of above-mentioned first embodiment and second embodiment, such as Fig. 3, Fig. 4It is shown, comprising:
S100 obtains corpus sample database, establishes audio repository and semantic slot according to the corpus sample in the corpus sample database.
S200 obtains speech audio.
S300 compares the participle audio in the speech audio and the audio repository, raw in the speech audioThe participle clip audio being consistent at matching.
S400 merges the identical participle clip audio, each after statistics merges segments clip audio in institute's predicateThe frequency occurred in sound audio.
S500 obtains the corresponding participle segment semanteme of the participle clip audio according to the semantic slot.
S610 forms corresponding semantic set according to the participle segment semanteme, merges semantic similarity or similar institute's predicateJustice set.
Specifically, forming corresponding semantic set according to participle segment semanteme, each participle segment semanteme is correspondingly formedOne semantic set, then identifies the semanteme of each semanteme set, merges wherein semantic similarity or similar semantic set, closesAnd semantic collection remaining later is combined into any one in the semantic set mutually merged.
Such as semantic set " cup " and " teacup " can merge, remaining semantic collection is combined into " cup " after mergingOr " teacup ", the probability that " cup " occurs in speech audio before merging are 0.3, the probability of " teacup " is 0.1, after mergingThe probability of remaining semantic set is 0.4.
S620 obtains the corresponding target of the participle clip audio according to the audio repository and segments.
Whether S630 judges in the target participle to include the crucial participle.
Specifically, determining that participle clip audio is corresponding according to the corresponding relationship for segmenting and segmenting in audio repository between audioThen whether target participle judges in target participle to include above-mentioned crucial participle, again if there is so the key segments oneDetermine the semanteme that the speech audio can be indicated in degree.
The frequency that S640 occurs in the speech audio in conjunction with the participle clip audio, chooses one or moreTag along sort of the semanteme as the speech audio corresponding to semantic set after merging.
Specifically, key participle can indicate the language to a certain extent if including crucial participle in target participleThe semanteme of sound audio, then the frequency selection purposes tag along sort occurred in conjunction with crucial participle in speech audio.If in target participle notIt is segmented comprising key, then segments the frequency selection purposes tag along sort occurred in speech audio according to target.
The frequency that the S640 occurs in the speech audio in conjunction with the participle clip audio, chooses oneOr semanteme corresponding to the semantic set after multiple merging is specifically included as the tag along sort of the speech audio:
If including the crucial participle in the S641 target participle, the corresponding semantic collection of the crucial participle is chosenIt closes.
The frequency that S642 occurs in the speech audio in conjunction with the crucial participle, chooses described in one or moreTag along sort of the corresponding semanteme of the semantic set of key participle as the speech audio.
Specifically, the corresponding semantic set of crucial participle is chosen, in conjunction with pass if including crucial participle in target participleKey segments the frequency occurred in speech audio, and the corresponding semantic set of key participle is arranged according to the size order of the frequencyThen column are chosen and arrange the corresponding semanteme of the semantic set of forward one or more as tag along sort.
S700 classifies to the speech audio according to the tag along sort.
In the present embodiment, by the way that speech audio and the audio repository established by corpus sample and semantic slot are comparedIt is right, so that the participle clip audio obtained in speech audio corresponds to target participle, then by segmenting target participle and keyIt is matched, the probability occurred in speech audio in conjunction with participle clip audio chooses tag along sort, carries out to speech audioClassification.
The fourth embodiment of the present invention, as shown in figure 5, a kind of by segmenting the system 1000 classified to voice, packetIt includes:
Database module 1100 obtains corpus sample database, is established according to the corpus sample in the corpus sample databaseAudio repository and semantic slot.
Specifically, Database module 1100, which collects a large amount of corpus sample of acquisition, establishes corpus sample database, then divideAll corpus samples are analysed, participle and corresponding audio, semanteme etc. in corpus sample are obtained, to establish audio repository, languageAdopted slot.
Voice obtains module 1200, obtains speech audio.
Specifically, voice, which obtains module 1200, obtains speech audio, which may be the language that user inputs in real timeSound, such as during being exchanged with other users by voice mode, when user thinks that wherein certain one or more voice relates toAnd to valuable, the subsequent possible information needed, need to save it, it is checked for the ease of subsequent searching, it is therefore desirable toClassified preservation.
It in addition is also likely to be the audio of downloading or recording, such as the audio-frequency information amount of recording is larger, user is sufficientTime identifies one by one, therefore in order to rapidly and accurately finding the audio that oneself is needed in a large amount of audios of recording, needs pairAll audios carry out classification processing.
Matching module 1300, the speech audio that voice acquisition module 1200 is obtained and the DatabaseThe participle audio in the audio repository that module 1100 is established compares, and point that matching is consistent is generated in the speech audioWord clip audio.
Specifically, speech audio that matching module 1300 will acquire and summarizing the sound obtained according to a large amount of corpus sampleParticiple audio in frequency library is matched one by one, when certain in a certain participle audio in audio repository and the speech audio gotA part of matching result is the corresponding participle clip audio in the part to be generated in speech audio, thus by voice sound when being consistentFrequency splits into multiple participle clip audios.
Processing module 1400, merges the participle clip audio that the identical matching module 1300 obtains, and statistics is closedThe frequency that each participle clip audio after and occurs in the speech audio.
Specifically, processing module 1400 identifies all participle clip audios split out, by identical participle clip audioIt merges, the frequency that each participle clip audio after then statistics merges occurs in speech audio, point mergedWord clip audio is counted still according to the quantity before not merging.
Such as 10 participle clip audios are split out in certain speech audio, wherein participle clip audio is " animal " appearance5 times, " what " occurs 3 times, and "Yes" occurs 2 times, after identical participle clip audio is merged, obtains 3 participle piecesSection audio, the frequency that " animal " occurs are 0.5, and the frequency that " what " occurs is 0.3, and the frequency that " animal " occurs is 0.2.
Semanteme obtains module 1500, obtains described according to the semantic slot that the Database module 1100 is establishedThe corresponding participle segment of the participle clip audio obtained with module 1300 is semantic.
Specifically, when certain a part of matching result in a certain participle audio in audio repository and the speech audio gotWhen being consistent, the corresponding participle clip audio in the part is generated, corresponding point of the participle clip audio is then obtained according to audio repositoryWord, semanteme obtain module 1500 and get the corresponding participle segment language of the participle clip audio further according to the participle and semantic slotJustice.
Analysis module 1600, according to the semantic participle segment semanteme for obtaining the acquisition of module 1500 and the processingThe one or more semantic tag along sorts as the speech audio of the frequency selection purposes that module 1400 counts.
Specifically, analysis module 1600 is semantic according to the corresponding participle segment of participle clip audio, and combine eachThe frequency that participle clip audio occurs in speech audio, by the participle segment semanteme according to the descending of corresponding frequencySequence is arranged, and is chosen and is arranged the semantic tag along sort as the speech audio of forward one or more.
Above-mentioned method of speech classification is the speech audio that system analysis is got, and then by carrying out intelligence to speech audioClassification, but likewise, user can also select the corresponding tag along sort of speech audio according to the understanding of oneself.
Categorization module 1700 carries out the speech audio according to the tag along sort that the analysis module 1600 is chosenClassification.
Specifically, categorization module 1700 obtains the corresponding tag along sort of speech audio, no matter the tag along sort is system intelligenceIt is can choosing or that user independently selects, classification storage is carried out to the speech audio got according to the tag along sort, is convenient forSubsequent lookup.
In the present embodiment, corpus sample database is formed by collecting a large amount of corpus sample, then establishes audio repository and semantemeSlot matches convenient for the subsequent speech audio to acquisition, and then obtains the corresponding tag along sort of the speech audio.
The present invention chooses the tag along sort of speech audio by participle intelligence, then classifies to the speech audio, makesIt obtains all speech audios orderly to store, can fast and accurately find target voice audio convenient for subsequent.
Wherein, by merging identical participle clip audio in speech audio, the choosing of the tag along sort of speech audio is reducedRange is selected, in addition combined with the contingency table for the frequency selection speech audio that each participle clip audio occurs in speech audioLabel, it is ensured that selected tag along sort can farthest indicate the intention of the speech audio.
The fifth embodiment of the present invention is the optimal enforcement example of above-mentioned fourth embodiment, as shown in Figure 6, comprising:
Database module 1100 obtains corpus sample database, is established according to the corpus sample in the corpus sample databaseAudio repository and semantic slot.
The Database module 1100 specifically includes:
Participle unit 1110 obtains the corpus sample database, according to participle technique to described in the corpus sample databaseCorpus sample is segmented, and the participle for including in the corpus sample is obtained.
Specifically, participle unit 1110, which collects a large amount of corpus sample of acquisition, establishes corpus sample database, corpus sample is not onlyOnly refer to penman text, further includes voice, audio etc., it is corresponding that difference is that the corpus sample such as voice, audio needs first to be converted toThen text information carries out subsequent processing.
Corpus sample is segmented according to participle technique, judges the structure of sentence in corpus sample, identifies corpus sampleIn every a word in word part of speech, then by every a word in corpus sample according to the part of speech of word by entire sentenceIt is divided into the participles such as word, word and phrase composition.Therefore the participle for including in corpus sample and corresponding part of speech have been obtained.
Acquiring unit 1120 obtains the corresponding participle audio of the participle that the participle unit 1110 obtains.
Database unit 1130, the participle audio and corresponding described obtained according to the acquiring unit 1120The participle that participle unit 1110 obtains establishes audio repository.
Specifically, acquiring unit 1120, which obtains each, segments corresponding audio, due to age of user and accent etc. becauseThe influence of element, the same participle may correspond to multiple audios, and the different audios of the same participles of acquisition more as far as possible, one time subsequentUser speech can be identified comprehensively, avoid omitting.Then Database unit 1130 establishes audio repository according to all audios,In the corresponding relationship established in audio repository between participle and audio.
It is semantic to obtain the corresponding participle of the participle that the participle unit 1110 obtains for the acquiring unit 1120.
The Database unit 1130, the participle semanteme and corresponding obtained according to the acquiring unit 1120The participle that the participle unit 1110 obtains establishes semantic slot.
Specifically, acquiring unit 1120 obtains all participles for including in above-mentioned all corpus samples, database is builtVertical unit 1130 establishes semantic slot according to all participles and the corresponding participle of participle are semantic, and establish in semantic slot participle withCorresponding relationship between participle semanteme.
The Database module 1100 further include:
The acquiring unit 1120 obtains the corresponding corpus sample semanteme of the corpus sample and the participle unit 1110The obtained corresponding part of speech of the participle.
Resolution unit 1140, the corpus sample obtained in conjunction with the acquiring unit 1120 is semantic, the participle is semanticThe sentence structure of the corpus sample is parsed with the part of speech.
Marking unit 1150, if the resolution unit 1140 parses the participle and belongs to key in the sentence structureWord then marks the participle for crucial participle.
Specifically, acquiring unit 1120 obtains, the corresponding corpus sample of corpus sample is semantic and each participle is correspondingPart of speech, then resolution unit 1140 is semantic in conjunction with corpus sample, participle is semantic and the part of speech of participle parses the corpus sampleSentence structure.
Resolution unit 1140 first determines whether the part of speech of each participle, if the part of speech of certain participle belongs to conjunction etc. and do not haveThe part of speech of practical significance, then the participle does not have much influence corpus sample semanteme, therefore can exclude such point firstWord.
Secondly resolution unit 1140 judges the semantic influence to corpus sample semanteme of the participle of each participle, if judgementIt deletes whether certain participle also is appreciated that corpus sample is semantic, if then indicating that the participle is unimportant, otherwise indicates that the participle isUnderstand the keyword of corpus sample semanteme.Last marking unit 1150 will be determined as the mark of word segmentation of keyword for crucial participle.
Voice obtains module 1200, obtains speech audio.
Matching module 1300, the speech audio that voice acquisition module 1200 is obtained and the DatabaseThe participle audio in the audio repository that module 1100 is established compares, and point that matching is consistent is generated in the speech audioWord clip audio.
Processing module 1400, merges the participle clip audio that the identical matching module 1300 obtains, and statistics is closedThe frequency that each participle clip audio after and occurs in the speech audio.
Semanteme obtains module 1500, obtains described according to the semantic slot that the Database module 1100 is establishedThe corresponding participle segment of the participle clip audio obtained with module 1300 is semantic.
Analysis module 1600, according to the semantic participle segment semanteme for obtaining the acquisition of module 1500 and the processingThe one or more semantic classification for gathering corresponding semanteme as the speech audio of the frequency selection purposes that module 1400 countsLabel.
Categorization module 1700 carries out the speech audio according to the tag along sort that the analysis module 1600 is chosenClassification.
In the present embodiment, corpus sample is segmented according to participle technique, to establish audio repository and semantic slot, and is tiedIt closes the part of speech that corpus sample is semantic, participle is semantic and segments and parses the sentence structure of the corpus sample to judge crucial pointWord identifies speech audio in order to subsequent, chooses corresponding tag along sort.
The sixth embodiment of the present invention is the optimal enforcement example of above-mentioned fourth embodiment and the 5th embodiment, such as Fig. 7 instituteShow, comprising:
Database module 1100 obtains corpus sample database, is established according to the corpus sample in the corpus sample databaseAudio repository and semantic slot.
Voice obtains module 1200, obtains speech audio.
Matching module 1300, the speech audio that voice acquisition module 1200 is obtained and the DatabaseThe participle audio in the audio repository that module 1100 is established compares, and point that matching is consistent is generated in the speech audioWord clip audio.
Processing module 1400, merges the participle clip audio that the identical matching module 1300 obtains, and statistics is closedThe frequency that each participle clip audio after and occurs in the speech audio.
Semanteme obtains module 1500, obtains described according to the semantic slot that the Database module 1100 is establishedThe corresponding participle segment of the participle clip audio obtained with module 1300 is semantic.
Analysis module 1600, according to the semantic participle segment semanteme for obtaining the acquisition of module 1500 and the processingTag along sort of the frequency selection purposes one or multi-semantic meaning that module 1400 counts as the speech audio.
The analysis module 1600 specifically includes:
Combining unit 1610 forms semantic set according to the participle segment semanteme, merges semantic similarity or similar institutePredicate justice set.
Specifically, combining unit 1610 forms corresponding semantic set according to participle segment semanteme, each participle segmentSemanteme is correspondingly formed a semantic set, then identifies the semanteme of each semanteme set, merges wherein semantic similarity or similarSemantic set, remaining semantic collection is combined into any one in the semantic set mutually merged after merging.
Such as semantic set " cup " and " teacup " can merge, remaining semantic collection is combined into " cup " after mergingOr " teacup ", the probability that " cup " occurs in speech audio before merging are 0.3, the probability of " teacup " is 0.1, after mergingThe probability of remaining semantic set is 0.4.
Target segments acquiring unit 16201120, obtains the corresponding target of the participle clip audio according to the audio repositoryParticiple.
Judging unit 1630, judge in target participle that target participle acquiring unit 16201120 obtains whetherIt include the crucial participle.
Specifically, target segments acquiring unit 16201120 according to the corresponding pass for segmenting and segmenting in audio repository between audioSystem determines participle clip audio corresponding target participle, then judging unit 1630 judge in target participle again whether include onThe crucial participle stated, if there is so key participle can indicate the semanteme of the speech audio to a certain extent.
Analytical unit 1640 is chosen in conjunction with the frequency that the participle clip audio occurs in the speech audioSemanteme after one or more combining units 1610 merge gathers classification of the corresponding semanteme as the speech audioLabel.
If specifically, including crucial participle in 1640 segment target of analytical unit participle, key segments certain journeyThe semanteme of the speech audio can be indicated on degree, then the frequency selection purposes tag along sort occurred in conjunction with crucial participle in speech audio.If not including crucial participle in target participle, the frequency selection purposes contingency table occurred in speech audio is segmented according to targetLabel.
The analytical unit 1640 specifically includes:
Subelement 1641 is chosen, if the judging unit 1630 judges to include described crucial point in the target participleWord then chooses the corresponding semantic set of the crucial participle.
Subelement 1642 is analyzed, in conjunction with the frequency that the crucial participle occurs in the speech audio, chooses oneThe corresponding semanteme of a or multiple semantic set for choosing the crucial participle that subelement 1641 is chosen is as institute's predicateThe tag along sort of sound audio.
Specifically, if including crucial participle in target participle, choosing subelement 1641, to choose crucial participle correspondingKey is segmented corresponding semanteme by semanteme set, the frequency that analysis subelement 1642 combines crucial participle to occur in speech audioSet is arranged according to the size order of the frequency, and then selection arranges corresponding to the semantic set of forward one or moreSemanteme is used as tag along sort.
Categorization module 1700 carries out the speech audio according to the tag along sort that the analysis module 1600 is chosenClassification.
In the present embodiment, by the way that speech audio and the audio repository established by corpus sample and semantic slot are comparedIt is right, so that the participle clip audio obtained in speech audio corresponds to target participle, then by segmenting target participle and keyIt is matched, the probability occurred in speech audio in conjunction with participle clip audio chooses tag along sort, carries out to speech audioClassification.
It should be noted that above-described embodiment can be freely combined as needed.The above is only of the invention preferredEmbodiment, it is noted that for those skilled in the art, in the premise for not departing from the principle of the inventionUnder, several improvements and modifications can also be made, these modifications and embellishments should also be considered as the scope of protection of the present invention.

Claims (10)

CN201811290932.9A2018-10-312018-10-31 A method and system for classifying speech by word segmentationActiveCN109446376B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201811290932.9ACN109446376B (en)2018-10-312018-10-31 A method and system for classifying speech by word segmentation

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811290932.9ACN109446376B (en)2018-10-312018-10-31 A method and system for classifying speech by word segmentation

Publications (2)

Publication NumberPublication Date
CN109446376Atrue CN109446376A (en)2019-03-08
CN109446376B CN109446376B (en)2021-06-25

Family

ID=65549449

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811290932.9AActiveCN109446376B (en)2018-10-312018-10-31 A method and system for classifying speech by word segmentation

Country Status (1)

CountryLink
CN (1)CN109446376B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110827850A (en)*2019-11-112020-02-21广州国音智能科技有限公司Audio separation method, device, equipment and computer readable storage medium
CN111128121A (en)*2019-12-202020-05-08贝壳技术有限公司 Voice information generation method and device, electronic device and storage medium
CN111540353A (en)*2020-04-162020-08-14重庆农村商业银行股份有限公司Semantic understanding method, device, equipment and storage medium
CN112256871A (en)*2020-10-162021-01-22国网江苏省电力有限公司连云港供电分公司Material fulfillment system and method
CN112699237A (en)*2020-12-242021-04-23百度在线网络技术(北京)有限公司Label determination method, device and storage medium
CN113709346A (en)*2021-08-272021-11-26北京八分量信息科技有限公司Big data monitoring system based on decentralization
CN116318716A (en)*2023-02-172023-06-23支付宝(杭州)信息技术有限公司 Authentication method, device, storage medium, electronic equipment and product
CN118261613A (en)*2024-04-162024-06-28湖南三湘银行股份有限公司 An AI-based intelligent marketing and identity authentication method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1682279A (en)*2002-09-162005-10-12松下电器产业株式会社 System and method for accessing and retrieving media files using speech recognition
US20060271364A1 (en)*2005-05-312006-11-30Robert Bosch CorporationDialogue management using scripts and combined confidence scores
US7734461B2 (en)*2006-03-032010-06-08Samsung Electronics Co., LtdApparatus for providing voice dialogue service and method of operating the same
CN104462600A (en)*2014-12-312015-03-25科大讯飞股份有限公司Method and device for achieving automatic classification of calling reasons
CN105488077A (en)*2014-10-102016-04-13腾讯科技(深圳)有限公司Content tag generation method and apparatus
CN105912521A (en)*2015-12-252016-08-31乐视致新电子科技(天津)有限公司Method and device for parsing voice content
CN106778862A (en)*2016-12-122017-05-31上海智臻智能网络科技股份有限公司A kind of information classification approach and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1682279A (en)*2002-09-162005-10-12松下电器产业株式会社 System and method for accessing and retrieving media files using speech recognition
US20060271364A1 (en)*2005-05-312006-11-30Robert Bosch CorporationDialogue management using scripts and combined confidence scores
US7734461B2 (en)*2006-03-032010-06-08Samsung Electronics Co., LtdApparatus for providing voice dialogue service and method of operating the same
CN105488077A (en)*2014-10-102016-04-13腾讯科技(深圳)有限公司Content tag generation method and apparatus
CN104462600A (en)*2014-12-312015-03-25科大讯飞股份有限公司Method and device for achieving automatic classification of calling reasons
CN105912521A (en)*2015-12-252016-08-31乐视致新电子科技(天津)有限公司Method and device for parsing voice content
CN106778862A (en)*2016-12-122017-05-31上海智臻智能网络科技股份有限公司A kind of information classification approach and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴尉林等: "基于两阶段分类的口语理解方法", 《计算机研究与发展》*

Cited By (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110827850B (en)*2019-11-112022-06-21广州国音智能科技有限公司Audio separation method, device, equipment and computer readable storage medium
CN110827850A (en)*2019-11-112020-02-21广州国音智能科技有限公司Audio separation method, device, equipment and computer readable storage medium
CN111128121A (en)*2019-12-202020-05-08贝壳技术有限公司 Voice information generation method and device, electronic device and storage medium
CN111540353A (en)*2020-04-162020-08-14重庆农村商业银行股份有限公司Semantic understanding method, device, equipment and storage medium
CN111540353B (en)*2020-04-162022-11-15重庆农村商业银行股份有限公司Semantic understanding method, device, equipment and storage medium
CN112256871A (en)*2020-10-162021-01-22国网江苏省电力有限公司连云港供电分公司Material fulfillment system and method
CN112256871B (en)*2020-10-162021-05-07国网江苏省电力有限公司连云港供电分公司 A material contract performance system and method
CN112699237B (en)*2020-12-242021-10-15百度在线网络技术(北京)有限公司Label determination method, device and storage medium
CN112699237A (en)*2020-12-242021-04-23百度在线网络技术(北京)有限公司Label determination method, device and storage medium
CN113709346A (en)*2021-08-272021-11-26北京八分量信息科技有限公司Big data monitoring system based on decentralization
CN116318716A (en)*2023-02-172023-06-23支付宝(杭州)信息技术有限公司 Authentication method, device, storage medium, electronic equipment and product
CN118261613A (en)*2024-04-162024-06-28湖南三湘银行股份有限公司 An AI-based intelligent marketing and identity authentication method and device
CN118261613B (en)*2024-04-162025-04-25湖南三湘银行股份有限公司Intelligent marketing and identity authentication method and device based on AI

Also Published As

Publication numberPublication date
CN109446376B (en)2021-06-25

Similar Documents

PublicationPublication DateTitle
CN109446376A (en)Method and system for classifying voice through word segmentation
CN101620596B (en)Multi-document auto-abstracting method facing to inquiry
US20140304267A1 (en)Suffix tree similarity measure for document clustering
CN110457696A (en)A kind of talent towards file data and policy intelligent Matching system and method
CN105824959A (en)Public opinion monitoring method and system
US20130325864A1 (en)Systems and methods for building a universal multimedia learner
CN103425640A (en)Multimedia questioning-answering system and method
CN114547370B (en) Video summary extraction method and system
CN104933113A (en)Expression input method and device based on semantic understanding
CN113312474A (en)Similar case intelligent retrieval system of legal documents based on deep learning
US20040163035A1 (en)Method for automatic and semi-automatic classification and clustering of non-deterministic texts
CN109241332A (en)Method and system for determining semantics through voice
CN1270361A (en)Method and device for audio information searching by content and loudspeaker information
CN108664599A (en)Intelligent answer method, apparatus, intelligent answer server and storage medium
CN105718585B (en) Document and tag word semantic association method and device
CN118503390B (en)Automatic optimization method and system based on intelligent data memory bank
CN109308324A (en) An image retrieval method and system based on hand-drawn style recommendation
CN109446399A (en)A kind of video display entity search method
CN113282834A (en)Web search intelligent ordering method, system and computer storage medium based on mobile internet data deep mining
CN115618014A (en)Standard document analysis management system and method applying big data technology
CN117573882A (en)Agricultural multi-mode intelligent retrieval technology and system based on multi-source heterogeneous data
CN110970112A (en)Method and system for constructing knowledge graph for nutrition and health
CN109215636A (en)Voice information classification method and system
CN107784024B (en)Construct the method and device of party's portrait
CN109766442A (en)method and system for classifying user notes

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp