Summary of the invention
The object of the present invention is to provide a kind of by segmenting the method and system classified to voice, and realization passes through participleRapidly and accurately classify to the content of speech audio, is convenient for the mesh of subsequent searching so as to clearly store voice audio's.
Technical solution provided by the invention is as follows:
The present invention provides a kind of by segmenting the method classified to voice characterized by comprising
Corpus sample database is obtained, audio repository and semantic slot are established according to the corpus sample in the corpus sample database;
Obtain speech audio;
Participle audio in the speech audio and the audio repository is compared, the generation in the speech audioWith the participle clip audio being consistent;
Merge the identical participle clip audio, each after statistics merges segments clip audio in the voice soundThe frequency occurred in frequency;
It is semantic that the corresponding participle segment of the participle clip audio is obtained according to the semantic slot;
According to one or more semantic points as the speech audio of the participle segment semanteme and the frequency selection purposesClass label;
Classified according to the tag along sort to the speech audio.
Further, the acquisition corpus sample database establishes audio according to the corpus sample in the corpus sample databaseLibrary and semantic slot specifically include:
The corpus sample database is obtained, the corpus sample in the corpus sample database is divided according to participle techniqueWord obtains the participle for including in the corpus sample;
The corresponding participle audio of the participle is obtained, audio is established according to the participle audio and the corresponding participleLibrary;
It is semantic to obtain the corresponding participle of the participle, establishes semantic slot with the participle according to the participle is semantic.
Further, the acquisition corpus sample database establishes audio according to the corpus sample in the corpus sample databaseLibrary and semantic slot further include:
Obtain the corresponding corpus sample semanteme of the corpus sample and the corresponding part of speech of the participle;
In conjunction with the corpus sample, semantic, the described semantic and described part of speech of participle parses the sentence knot of the corpus sampleStructure;
If the participle belongs to keyword in the sentence structure, mark the participle for crucial participle.
Further, described according to the one or more semantic set pair of the participle segment semanteme and the frequency selection purposesThe semanteme answered is specifically included as the tag along sort of the speech audio:
Corresponding semantic set is formed according to the participle segment semanteme, merges semantic similarity or the similar semantic collectionIt closes;
In conjunction with the frequency that the participle clip audio occurs in the speech audio, one or more merge is chosenTag along sort of the corresponding semanteme of semantic set afterwards as the speech audio.
Further, the frequency that participle clip audio occurs in the speech audio described in the combination, choosingSemanteme after taking one or more to merge gathers corresponding semanteme as the tag along sort of the speech audio
The corresponding target participle of the participle clip audio is obtained according to the audio repository;
Judge in the target participle whether to include the crucial participle;
The frequency that participle clip audio occurs in the speech audio described in the combination, chooses one or moreSemanteme corresponding to semantic set after a merging is specifically included as the tag along sort of the speech audio:
If in the target participle including the crucial participle, the corresponding semantic set of the crucial participle is chosen;
In conjunction with the frequency that the crucial participle occurs in the speech audio, one or more keys are chosenTag along sort of the corresponding semanteme of the semantic set of participle as the speech audio.
The present invention also provides a kind of by segmenting the system classified to voice characterized by comprising
Database module obtains corpus sample database, establishes audio according to the corpus sample in the corpus sample databaseLibrary and semantic slot;
Voice obtains module, obtains speech audio;
The voice is obtained the speech audio and Database module foundation that module obtains by matching moduleThe audio repository in participle audio compare, the participle clip audio that is consistent of matching is generated in the speech audio;
Processing module merges the participle clip audio that the identical matching module obtains, every after statistics mergingThe frequency that one participle clip audio occurs in the speech audio;
Semanteme obtains module, obtains the matching module according to the semantic slot that the Database module is established and obtainsThe corresponding participle segment of the participle clip audio arrived is semantic;
Analysis module is counted according to the semantic participle segment semanteme for obtaining module acquisition and the processing moduleThe one or more semantic tag along sorts as the speech audio of the frequency selection purposes;
Categorization module classifies to the speech audio according to the tag along sort that the analysis module is chosen.
Further, the Database module specifically includes:
Participle unit obtains the corpus sample database, according to participle technique to the corpus in the corpus sample databaseSample is segmented, and the participle for including in the corpus sample is obtained;
Acquiring unit obtains the corresponding participle audio of the participle that the participle unit obtains;
Database unit, the participle audio and the corresponding participle unit obtained according to the acquiring unitThe obtained participle establishes audio repository;
It is semantic to obtain the corresponding participle of the participle that the participle unit obtains for the acquiring unit;
The Database unit, the participle semanteme and the corresponding participle obtained according to the acquiring unitThe participle that unit obtains establishes semantic slot.
Further, the Database module further include:
The acquiring unit obtains the corresponding corpus sample semanteme of the corpus sample and institute that the participle unit obtainsState the corresponding part of speech of participle;
Resolution unit, the corpus sample obtained in conjunction with the acquiring unit is semantic, the participle is semantic and institute's predicateProperty parses the sentence structure of the corpus sample;
Marking unit is marked if the resolution unit parses the participle and belongs to keyword in the sentence structureRemember the participle for crucial participle.
Further, the analysis module specifically includes:
Combining unit forms corresponding semantic set according to the participle segment semanteme, merges semantic similarity or similarThe semantic set;
Analytical unit chooses one in conjunction with the frequency that the participle clip audio occurs in the speech audioOr tag along sort of the semanteme as the speech audio corresponding to the semantic set after multiple combining units merging.
Further, the analysis module further include:
Target segments acquiring unit, obtains the corresponding target of the participle clip audio according to the audio repository and segments;
Judging unit judges whether in the target participle of the target participle acquiring unit acquisition include the passKey participle;
The analytical unit specifically includes:
Subelement is chosen, if the judging unit judges in the target participle to include the crucial participle, is chosenThe key segments corresponding semantic set;
Analyze subelement, in conjunction with the frequency that the crucial participle occurs in the speech audio, choose one orThe corresponding semanteme of multiple semantic set for choosing the crucial participle that subelement is chosen is as the speech audioTag along sort.
What is provided through the invention is a kind of by segmenting the method and system classified to voice, can bring with down towardIt is few a kind of the utility model has the advantages that
1, in the present invention, corpus sample database is formed by collecting a large amount of corpus sample, then establishes audio repository and semantemeSlot matches convenient for the subsequent speech audio to acquisition, and then obtains the corresponding tag along sort of the speech audio.
2, in the present invention, by merging identical participle clip audio in speech audio, the contingency table of speech audio is reducedThe range of choice of label.
3, in the present invention, speech audio is selected in conjunction with the frequency that each participle clip audio occurs in speech audioTag along sort, it is ensured that selected tag along sort can farthest indicate the intention of the speech audio.
4, in the present invention, the tag along sort of speech audio is chosen by participle intelligence, then the speech audio is dividedClass can fast and accurately find target voice audio so that all speech audios orderly store convenient for subsequent.
Specific embodiment
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, Detailed description of the invention will be compareed belowA specific embodiment of the invention.It should be evident that drawings in the following description are only some embodiments of the invention, forFor those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings otherAttached drawing, and obtain other embodiments.
To make simplified form, part related to the present invention is only schematically shown in each figure, they are not representedIts practical structures as product.In addition, there is identical structure or function in some figures so that simplified form is easy to understandComponent only symbolically depicts one of those, or has only marked one of those.Herein, "one" is not only indicated" only this ", can also indicate the situation of " more than one ".
The first embodiment of the present invention, as shown in Figure 1, a kind of by segmenting the method classified to voice, comprising:
S100 obtains corpus sample database, establishes audio repository and semantic slot according to the corpus sample in the corpus sample database.
Corpus sample database is established specifically, collecting and obtaining a large amount of corpus sample, all corpus samples is then analyzed, obtainsParticiple in corpus sample and corresponding audio, semanteme etc. out, to establish audio repository, semantic slot.
S200 obtains speech audio.
Specifically, obtaining speech audio, which may be the voice that user inputs in real time, such as use with otherFamily by voice mode exchange during, when user feel wherein certain one or more voice be related to it is valuable, subsequent canThe information that can be needed, needs to save it, checks for the ease of subsequent searching, it is therefore desirable to preservation of being classified.
It in addition is also likely to be the audio of downloading or recording, such as the audio-frequency information amount of recording is larger, user is sufficientTime identifies one by one, therefore in order to rapidly and accurately finding the audio that oneself is needed in a large amount of audios of recording, needs pairAll audios carry out classification processing.
S300 compares the participle audio in the speech audio and the audio repository, raw in the speech audioThe participle clip audio being consistent at matching.
Specifically, the speech audio that will acquire and summarizing the participle in the audio repository obtained according to a large amount of corpus sampleAudio is matched one by one, when certain a part matching knot in a certain participle audio in audio repository and the speech audio gotFruit is the corresponding participle clip audio in the part to be generated in speech audio, to speech audio be split into multiple when being consistentSegment clip audio.
S400 merges the identical participle clip audio, each after statistics merges segments clip audio in institute's predicateThe frequency occurred in sound audio.
Specifically, all participle clip audios split out of identification, identical participle clip audio are merged, thenEach after statistics merges segments the frequency that clip audio occurs in speech audio, and the participle clip audio merged is stillIt is counted according to the quantity before not merging.
Such as 10 participle clip audios are split out in certain speech audio, wherein participle clip audio is " animal " appearance5 times, " what " occurs 3 times, and "Yes" occurs 2 times, after identical participle clip audio is merged, obtains 3 participle piecesSection audio, the frequency that " animal " occurs are 0.5, and the frequency that " what " occurs is 0.3, and the frequency that " animal " occurs is 0.2.
S500 obtains the corresponding participle segment semanteme of the participle clip audio according to the semantic slot.
Specifically, when certain a part of matching result in a certain participle audio in audio repository and the speech audio gotWhen being consistent, the corresponding participle clip audio in the part is generated, corresponding point of the participle clip audio is then obtained according to audio repositoryIt is semantic to get the corresponding participle segment of the participle clip audio further according to the participle and semantic slot for word.
S600 is one or more semantic as the speech audio according to the participle segment semanteme and the frequency selection purposesTag along sort.
Specifically, it is semantic according to the corresponding participle segment of participle clip audio, and combine each participle clip audioThe frequency occurred in speech audio arranges the participle segment semanteme according to the descending sequence of corresponding frequencyColumn are chosen and arrange the semantic tag along sort as the speech audio of forward one or more.
Above-mentioned method of speech classification is the speech audio that system analysis is got, and then by carrying out intelligence to speech audioClassification, but likewise, user can also select the corresponding tag along sort of speech audio according to the understanding of oneself.
S700 classifies to the speech audio according to the tag along sort.
Specifically, the corresponding tag along sort of speech audio is obtained, no matter the tag along sort is what system was intelligently chosen, stillWhat user independently selected, classification storage is carried out to the speech audio got according to the tag along sort, is convenient for subsequent lookup.
In the present embodiment, corpus sample database is formed by collecting a large amount of corpus sample, then establishes audio repository and semantemeSlot matches convenient for the subsequent speech audio to acquisition, and then obtains the corresponding tag along sort of the speech audio.
The present invention chooses the tag along sort of speech audio by participle intelligence, then classifies to the speech audio, makesIt obtains all speech audios orderly to store, can fast and accurately find target voice audio convenient for subsequent.
Wherein, by merging identical participle clip audio in speech audio, the choosing of the tag along sort of speech audio is reducedRange is selected, in addition combined with the contingency table for the frequency selection speech audio that each participle clip audio occurs in speech audioLabel, it is ensured that selected tag along sort can farthest indicate the intention of the speech audio.
The second embodiment of the present invention is the optimal enforcement example of above-mentioned first embodiment, as shown in Figure 2, comprising:
S110 obtains corpus sample database, is segmented according to participle technique to the corpus sample in the corpus sample database,Obtain the participle for including in the corpus sample.
Corpus sample database is established specifically, collecting and obtaining a large amount of corpus sample, corpus sample refers not only to penman text,It further include voice, audio etc., difference is that the corpus sample such as voice, audio needs first to be converted to corresponding text information, thenCarry out subsequent processing.
Corpus sample is segmented according to participle technique, judges the structure of sentence in corpus sample, identifies corpus sampleIn every a word in word part of speech, then by every a word in corpus sample according to the part of speech of word by entire sentenceIt is divided into the participles such as word, word and phrase composition.Therefore the participle for including in corpus sample and corresponding part of speech have been obtained.
S120 obtains the corresponding participle audio of the participle, establishes sound according to the participle audio and the corresponding participleFrequency library.
Specifically, obtaining each segments corresponding audio, it is same due to the influence of the factors such as age of user and accentA participle may correspond to multiple audios, the different audios of the same participles of acquisition more as far as possible, and one time subsequent to identify comprehensivelyUser speech avoids omitting.Then audio repository is established according to all audios, is being established between participle and audio in audio repositoryCorresponding relationship.
S130 obtains the corresponding participle semanteme of the participle, establishes semantic slot with the participle according to the participle is semantic.
Specifically, all participles for including in above-mentioned all corpus samples are obtained, according to all participle and participleCorresponding participle is semantic to establish semantic slot, and in the corresponding relationship established between participle and participle semanteme in semantic slot.
S140 obtains the corresponding corpus sample semanteme of the corpus sample and the corresponding part of speech of the participle.
S150 semantic, described sentence for segmenting the semantic and described part of speech and parsing the corpus sample in conjunction with the corpus sampleStructure.
If the S160 participle belongs to keyword in the sentence structure, mark the participle for crucial participle.
Specifically, obtaining, the corresponding corpus sample of corpus sample is semantic and each segments corresponding part of speech, then tiesClose the sentence structure that the part of speech that corpus sample is semantic, participle is semantic and segments parses the corpus sample.
The part of speech of each participle is first determined whether, if part of speech of certain participle belongs to the word of the not no practical significance such as conjunctionProperty, then the participle does not have much influence corpus sample semanteme, therefore can exclude such participle first.
Next judges the semantic influence to corpus sample semanteme of the participle of each participle, is if certain participle is deleted in judgementIt is no also to be appreciated that corpus sample is semantic, if then indicating that the participle is unimportant, otherwise indicate that the participle is to understand corpus sampleSemantic keyword.It will finally be determined as the mark of word segmentation of keyword for crucial participle.
S200 obtains speech audio.
S300 compares the participle audio in the speech audio and the audio repository, raw in the speech audioThe participle clip audio being consistent at matching.
S400 merges the identical participle clip audio, each after statistics merges segments clip audio in institute's predicateThe frequency occurred in sound audio.
S500 obtains the corresponding participle segment semanteme of the participle clip audio according to the semantic slot.
S600 is one or more semantic as the speech audio according to the participle segment semanteme and the frequency selection purposesTag along sort.
S700 classifies to the speech audio according to the tag along sort.
In the present embodiment, corpus sample is segmented according to participle technique, to establish audio repository and semantic slot, and is tiedIt closes the part of speech that corpus sample is semantic, participle is semantic and segments and parses the sentence structure of the corpus sample to judge crucial pointWord identifies speech audio in order to subsequent, chooses corresponding tag along sort.
The third embodiment of the present invention is the optimal enforcement example of above-mentioned first embodiment and second embodiment, such as Fig. 3, Fig. 4It is shown, comprising:
S100 obtains corpus sample database, establishes audio repository and semantic slot according to the corpus sample in the corpus sample database.
S200 obtains speech audio.
S300 compares the participle audio in the speech audio and the audio repository, raw in the speech audioThe participle clip audio being consistent at matching.
S400 merges the identical participle clip audio, each after statistics merges segments clip audio in institute's predicateThe frequency occurred in sound audio.
S500 obtains the corresponding participle segment semanteme of the participle clip audio according to the semantic slot.
S610 forms corresponding semantic set according to the participle segment semanteme, merges semantic similarity or similar institute's predicateJustice set.
Specifically, forming corresponding semantic set according to participle segment semanteme, each participle segment semanteme is correspondingly formedOne semantic set, then identifies the semanteme of each semanteme set, merges wherein semantic similarity or similar semantic set, closesAnd semantic collection remaining later is combined into any one in the semantic set mutually merged.
Such as semantic set " cup " and " teacup " can merge, remaining semantic collection is combined into " cup " after mergingOr " teacup ", the probability that " cup " occurs in speech audio before merging are 0.3, the probability of " teacup " is 0.1, after mergingThe probability of remaining semantic set is 0.4.
S620 obtains the corresponding target of the participle clip audio according to the audio repository and segments.
Whether S630 judges in the target participle to include the crucial participle.
Specifically, determining that participle clip audio is corresponding according to the corresponding relationship for segmenting and segmenting in audio repository between audioThen whether target participle judges in target participle to include above-mentioned crucial participle, again if there is so the key segments oneDetermine the semanteme that the speech audio can be indicated in degree.
The frequency that S640 occurs in the speech audio in conjunction with the participle clip audio, chooses one or moreTag along sort of the semanteme as the speech audio corresponding to semantic set after merging.
Specifically, key participle can indicate the language to a certain extent if including crucial participle in target participleThe semanteme of sound audio, then the frequency selection purposes tag along sort occurred in conjunction with crucial participle in speech audio.If in target participle notIt is segmented comprising key, then segments the frequency selection purposes tag along sort occurred in speech audio according to target.
The frequency that the S640 occurs in the speech audio in conjunction with the participle clip audio, chooses oneOr semanteme corresponding to the semantic set after multiple merging is specifically included as the tag along sort of the speech audio:
If including the crucial participle in the S641 target participle, the corresponding semantic collection of the crucial participle is chosenIt closes.
The frequency that S642 occurs in the speech audio in conjunction with the crucial participle, chooses described in one or moreTag along sort of the corresponding semanteme of the semantic set of key participle as the speech audio.
Specifically, the corresponding semantic set of crucial participle is chosen, in conjunction with pass if including crucial participle in target participleKey segments the frequency occurred in speech audio, and the corresponding semantic set of key participle is arranged according to the size order of the frequencyThen column are chosen and arrange the corresponding semanteme of the semantic set of forward one or more as tag along sort.
S700 classifies to the speech audio according to the tag along sort.
In the present embodiment, by the way that speech audio and the audio repository established by corpus sample and semantic slot are comparedIt is right, so that the participle clip audio obtained in speech audio corresponds to target participle, then by segmenting target participle and keyIt is matched, the probability occurred in speech audio in conjunction with participle clip audio chooses tag along sort, carries out to speech audioClassification.
The fourth embodiment of the present invention, as shown in figure 5, a kind of by segmenting the system 1000 classified to voice, packetIt includes:
Database module 1100 obtains corpus sample database, is established according to the corpus sample in the corpus sample databaseAudio repository and semantic slot.
Specifically, Database module 1100, which collects a large amount of corpus sample of acquisition, establishes corpus sample database, then divideAll corpus samples are analysed, participle and corresponding audio, semanteme etc. in corpus sample are obtained, to establish audio repository, languageAdopted slot.
Voice obtains module 1200, obtains speech audio.
Specifically, voice, which obtains module 1200, obtains speech audio, which may be the language that user inputs in real timeSound, such as during being exchanged with other users by voice mode, when user thinks that wherein certain one or more voice relates toAnd to valuable, the subsequent possible information needed, need to save it, it is checked for the ease of subsequent searching, it is therefore desirable toClassified preservation.
It in addition is also likely to be the audio of downloading or recording, such as the audio-frequency information amount of recording is larger, user is sufficientTime identifies one by one, therefore in order to rapidly and accurately finding the audio that oneself is needed in a large amount of audios of recording, needs pairAll audios carry out classification processing.
Matching module 1300, the speech audio that voice acquisition module 1200 is obtained and the DatabaseThe participle audio in the audio repository that module 1100 is established compares, and point that matching is consistent is generated in the speech audioWord clip audio.
Specifically, speech audio that matching module 1300 will acquire and summarizing the sound obtained according to a large amount of corpus sampleParticiple audio in frequency library is matched one by one, when certain in a certain participle audio in audio repository and the speech audio gotA part of matching result is the corresponding participle clip audio in the part to be generated in speech audio, thus by voice sound when being consistentFrequency splits into multiple participle clip audios.
Processing module 1400, merges the participle clip audio that the identical matching module 1300 obtains, and statistics is closedThe frequency that each participle clip audio after and occurs in the speech audio.
Specifically, processing module 1400 identifies all participle clip audios split out, by identical participle clip audioIt merges, the frequency that each participle clip audio after then statistics merges occurs in speech audio, point mergedWord clip audio is counted still according to the quantity before not merging.
Such as 10 participle clip audios are split out in certain speech audio, wherein participle clip audio is " animal " appearance5 times, " what " occurs 3 times, and "Yes" occurs 2 times, after identical participle clip audio is merged, obtains 3 participle piecesSection audio, the frequency that " animal " occurs are 0.5, and the frequency that " what " occurs is 0.3, and the frequency that " animal " occurs is 0.2.
Semanteme obtains module 1500, obtains described according to the semantic slot that the Database module 1100 is establishedThe corresponding participle segment of the participle clip audio obtained with module 1300 is semantic.
Specifically, when certain a part of matching result in a certain participle audio in audio repository and the speech audio gotWhen being consistent, the corresponding participle clip audio in the part is generated, corresponding point of the participle clip audio is then obtained according to audio repositoryWord, semanteme obtain module 1500 and get the corresponding participle segment language of the participle clip audio further according to the participle and semantic slotJustice.
Analysis module 1600, according to the semantic participle segment semanteme for obtaining the acquisition of module 1500 and the processingThe one or more semantic tag along sorts as the speech audio of the frequency selection purposes that module 1400 counts.
Specifically, analysis module 1600 is semantic according to the corresponding participle segment of participle clip audio, and combine eachThe frequency that participle clip audio occurs in speech audio, by the participle segment semanteme according to the descending of corresponding frequencySequence is arranged, and is chosen and is arranged the semantic tag along sort as the speech audio of forward one or more.
Above-mentioned method of speech classification is the speech audio that system analysis is got, and then by carrying out intelligence to speech audioClassification, but likewise, user can also select the corresponding tag along sort of speech audio according to the understanding of oneself.
Categorization module 1700 carries out the speech audio according to the tag along sort that the analysis module 1600 is chosenClassification.
Specifically, categorization module 1700 obtains the corresponding tag along sort of speech audio, no matter the tag along sort is system intelligenceIt is can choosing or that user independently selects, classification storage is carried out to the speech audio got according to the tag along sort, is convenient forSubsequent lookup.
In the present embodiment, corpus sample database is formed by collecting a large amount of corpus sample, then establishes audio repository and semantemeSlot matches convenient for the subsequent speech audio to acquisition, and then obtains the corresponding tag along sort of the speech audio.
The present invention chooses the tag along sort of speech audio by participle intelligence, then classifies to the speech audio, makesIt obtains all speech audios orderly to store, can fast and accurately find target voice audio convenient for subsequent.
Wherein, by merging identical participle clip audio in speech audio, the choosing of the tag along sort of speech audio is reducedRange is selected, in addition combined with the contingency table for the frequency selection speech audio that each participle clip audio occurs in speech audioLabel, it is ensured that selected tag along sort can farthest indicate the intention of the speech audio.
The fifth embodiment of the present invention is the optimal enforcement example of above-mentioned fourth embodiment, as shown in Figure 6, comprising:
Database module 1100 obtains corpus sample database, is established according to the corpus sample in the corpus sample databaseAudio repository and semantic slot.
The Database module 1100 specifically includes:
Participle unit 1110 obtains the corpus sample database, according to participle technique to described in the corpus sample databaseCorpus sample is segmented, and the participle for including in the corpus sample is obtained.
Specifically, participle unit 1110, which collects a large amount of corpus sample of acquisition, establishes corpus sample database, corpus sample is not onlyOnly refer to penman text, further includes voice, audio etc., it is corresponding that difference is that the corpus sample such as voice, audio needs first to be converted toThen text information carries out subsequent processing.
Corpus sample is segmented according to participle technique, judges the structure of sentence in corpus sample, identifies corpus sampleIn every a word in word part of speech, then by every a word in corpus sample according to the part of speech of word by entire sentenceIt is divided into the participles such as word, word and phrase composition.Therefore the participle for including in corpus sample and corresponding part of speech have been obtained.
Acquiring unit 1120 obtains the corresponding participle audio of the participle that the participle unit 1110 obtains.
Database unit 1130, the participle audio and corresponding described obtained according to the acquiring unit 1120The participle that participle unit 1110 obtains establishes audio repository.
Specifically, acquiring unit 1120, which obtains each, segments corresponding audio, due to age of user and accent etc. becauseThe influence of element, the same participle may correspond to multiple audios, and the different audios of the same participles of acquisition more as far as possible, one time subsequentUser speech can be identified comprehensively, avoid omitting.Then Database unit 1130 establishes audio repository according to all audios,In the corresponding relationship established in audio repository between participle and audio.
It is semantic to obtain the corresponding participle of the participle that the participle unit 1110 obtains for the acquiring unit 1120.
The Database unit 1130, the participle semanteme and corresponding obtained according to the acquiring unit 1120The participle that the participle unit 1110 obtains establishes semantic slot.
Specifically, acquiring unit 1120 obtains all participles for including in above-mentioned all corpus samples, database is builtVertical unit 1130 establishes semantic slot according to all participles and the corresponding participle of participle are semantic, and establish in semantic slot participle withCorresponding relationship between participle semanteme.
The Database module 1100 further include:
The acquiring unit 1120 obtains the corresponding corpus sample semanteme of the corpus sample and the participle unit 1110The obtained corresponding part of speech of the participle.
Resolution unit 1140, the corpus sample obtained in conjunction with the acquiring unit 1120 is semantic, the participle is semanticThe sentence structure of the corpus sample is parsed with the part of speech.
Marking unit 1150, if the resolution unit 1140 parses the participle and belongs to key in the sentence structureWord then marks the participle for crucial participle.
Specifically, acquiring unit 1120 obtains, the corresponding corpus sample of corpus sample is semantic and each participle is correspondingPart of speech, then resolution unit 1140 is semantic in conjunction with corpus sample, participle is semantic and the part of speech of participle parses the corpus sampleSentence structure.
Resolution unit 1140 first determines whether the part of speech of each participle, if the part of speech of certain participle belongs to conjunction etc. and do not haveThe part of speech of practical significance, then the participle does not have much influence corpus sample semanteme, therefore can exclude such point firstWord.
Secondly resolution unit 1140 judges the semantic influence to corpus sample semanteme of the participle of each participle, if judgementIt deletes whether certain participle also is appreciated that corpus sample is semantic, if then indicating that the participle is unimportant, otherwise indicates that the participle isUnderstand the keyword of corpus sample semanteme.Last marking unit 1150 will be determined as the mark of word segmentation of keyword for crucial participle.
Voice obtains module 1200, obtains speech audio.
Matching module 1300, the speech audio that voice acquisition module 1200 is obtained and the DatabaseThe participle audio in the audio repository that module 1100 is established compares, and point that matching is consistent is generated in the speech audioWord clip audio.
Processing module 1400, merges the participle clip audio that the identical matching module 1300 obtains, and statistics is closedThe frequency that each participle clip audio after and occurs in the speech audio.
Semanteme obtains module 1500, obtains described according to the semantic slot that the Database module 1100 is establishedThe corresponding participle segment of the participle clip audio obtained with module 1300 is semantic.
Analysis module 1600, according to the semantic participle segment semanteme for obtaining the acquisition of module 1500 and the processingThe one or more semantic classification for gathering corresponding semanteme as the speech audio of the frequency selection purposes that module 1400 countsLabel.
Categorization module 1700 carries out the speech audio according to the tag along sort that the analysis module 1600 is chosenClassification.
In the present embodiment, corpus sample is segmented according to participle technique, to establish audio repository and semantic slot, and is tiedIt closes the part of speech that corpus sample is semantic, participle is semantic and segments and parses the sentence structure of the corpus sample to judge crucial pointWord identifies speech audio in order to subsequent, chooses corresponding tag along sort.
The sixth embodiment of the present invention is the optimal enforcement example of above-mentioned fourth embodiment and the 5th embodiment, such as Fig. 7 instituteShow, comprising:
Database module 1100 obtains corpus sample database, is established according to the corpus sample in the corpus sample databaseAudio repository and semantic slot.
Voice obtains module 1200, obtains speech audio.
Matching module 1300, the speech audio that voice acquisition module 1200 is obtained and the DatabaseThe participle audio in the audio repository that module 1100 is established compares, and point that matching is consistent is generated in the speech audioWord clip audio.
Processing module 1400, merges the participle clip audio that the identical matching module 1300 obtains, and statistics is closedThe frequency that each participle clip audio after and occurs in the speech audio.
Semanteme obtains module 1500, obtains described according to the semantic slot that the Database module 1100 is establishedThe corresponding participle segment of the participle clip audio obtained with module 1300 is semantic.
Analysis module 1600, according to the semantic participle segment semanteme for obtaining the acquisition of module 1500 and the processingTag along sort of the frequency selection purposes one or multi-semantic meaning that module 1400 counts as the speech audio.
The analysis module 1600 specifically includes:
Combining unit 1610 forms semantic set according to the participle segment semanteme, merges semantic similarity or similar institutePredicate justice set.
Specifically, combining unit 1610 forms corresponding semantic set according to participle segment semanteme, each participle segmentSemanteme is correspondingly formed a semantic set, then identifies the semanteme of each semanteme set, merges wherein semantic similarity or similarSemantic set, remaining semantic collection is combined into any one in the semantic set mutually merged after merging.
Such as semantic set " cup " and " teacup " can merge, remaining semantic collection is combined into " cup " after mergingOr " teacup ", the probability that " cup " occurs in speech audio before merging are 0.3, the probability of " teacup " is 0.1, after mergingThe probability of remaining semantic set is 0.4.
Target segments acquiring unit 16201120, obtains the corresponding target of the participle clip audio according to the audio repositoryParticiple.
Judging unit 1630, judge in target participle that target participle acquiring unit 16201120 obtains whetherIt include the crucial participle.
Specifically, target segments acquiring unit 16201120 according to the corresponding pass for segmenting and segmenting in audio repository between audioSystem determines participle clip audio corresponding target participle, then judging unit 1630 judge in target participle again whether include onThe crucial participle stated, if there is so key participle can indicate the semanteme of the speech audio to a certain extent.
Analytical unit 1640 is chosen in conjunction with the frequency that the participle clip audio occurs in the speech audioSemanteme after one or more combining units 1610 merge gathers classification of the corresponding semanteme as the speech audioLabel.
If specifically, including crucial participle in 1640 segment target of analytical unit participle, key segments certain journeyThe semanteme of the speech audio can be indicated on degree, then the frequency selection purposes tag along sort occurred in conjunction with crucial participle in speech audio.If not including crucial participle in target participle, the frequency selection purposes contingency table occurred in speech audio is segmented according to targetLabel.
The analytical unit 1640 specifically includes:
Subelement 1641 is chosen, if the judging unit 1630 judges to include described crucial point in the target participleWord then chooses the corresponding semantic set of the crucial participle.
Subelement 1642 is analyzed, in conjunction with the frequency that the crucial participle occurs in the speech audio, chooses oneThe corresponding semanteme of a or multiple semantic set for choosing the crucial participle that subelement 1641 is chosen is as institute's predicateThe tag along sort of sound audio.
Specifically, if including crucial participle in target participle, choosing subelement 1641, to choose crucial participle correspondingKey is segmented corresponding semanteme by semanteme set, the frequency that analysis subelement 1642 combines crucial participle to occur in speech audioSet is arranged according to the size order of the frequency, and then selection arranges corresponding to the semantic set of forward one or moreSemanteme is used as tag along sort.
Categorization module 1700 carries out the speech audio according to the tag along sort that the analysis module 1600 is chosenClassification.
In the present embodiment, by the way that speech audio and the audio repository established by corpus sample and semantic slot are comparedIt is right, so that the participle clip audio obtained in speech audio corresponds to target participle, then by segmenting target participle and keyIt is matched, the probability occurred in speech audio in conjunction with participle clip audio chooses tag along sort, carries out to speech audioClassification.
It should be noted that above-described embodiment can be freely combined as needed.The above is only of the invention preferredEmbodiment, it is noted that for those skilled in the art, in the premise for not departing from the principle of the inventionUnder, several improvements and modifications can also be made, these modifications and embellishments should also be considered as the scope of protection of the present invention.