Movatterモバイル変換


[0]ホーム

URL:


CN110580908A - command word detection method and device supporting different languages - Google Patents

command word detection method and device supporting different languages
Download PDF

Info

Publication number
CN110580908A
CN110580908ACN201910932340.0ACN201910932340ACN110580908ACN 110580908 ACN110580908 ACN 110580908ACN 201910932340 ACN201910932340 ACN 201910932340ACN 110580908 ACN110580908 ACN 110580908A
Authority
CN
China
Prior art keywords
classification prediction
command word
probability
audio features
command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910932340.0A
Other languages
Chinese (zh)
Inventor
匡方军
李深
雷欣
李志飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chumen Wenwen Information Technology Co Ltd
Original Assignee
Chumen Wenwen Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chumen Wenwen Information Technology Co LtdfiledCriticalChumen Wenwen Information Technology Co Ltd
Priority to CN201910932340.0ApriorityCriticalpatent/CN110580908A/en
Publication of CN110580908ApublicationCriticalpatent/CN110580908A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The invention discloses a method and a device for detecting command words supporting different languages, which comprises the steps of firstly collecting voice signals at least comprising two different languages; then extracting the audio features of the voice signals; further carrying out classification prediction on the extracted audio features through a deep learning network to obtain a classification prediction result; and if the classification prediction result is that the probability of the command word is maximum, taking the command word with the maximum probability as an output result.

Description

Command word detection method and device supporting different languages
Technical Field
The invention relates to a language identification technology, in particular to a method and equipment for detecting command words supporting different languages.
Background
With the continuous development of scientific technology, the voice interaction technology has been widely applied to embedded devices, such as mobile phones, watches, sound boxes, earphones, and the like. In order to reduce the operating power consumption of the device, a specific command word is generally used to wake up the device, such as "Hey Siri" of an apple mobile phone, and "favorite classmates" of a millet AI sound box.
In the related art, a command word detection system usually takes an input voice signal as an input of a deep learning network after feature extraction; the output of the deep learning network is the probability of each word in the command words, and after the probability of the input voice signal being the command word can be obtained through the post-processing module. When the probability is larger than a given threshold value, the system judges that the input voice signal contains a command word; otherwise, it is determined that no command word is detected. It is obvious that the current command word detection method supports only a single language and only the detection of a single command word.
Disclosure of Invention
In order to overcome the defects of the current command word detection system, the embodiment of the invention creatively provides a command word detection method and device supporting different languages.
according to a first aspect of the present invention, there is provided a command word detection method supporting different languages, the method comprising: collecting voice signals at least comprising two different languages; extracting audio features of the voice signal; carrying out classification prediction on the extracted audio features through a deep learning network to obtain a classification prediction result; and if the classification prediction result is that the probability of the command word is maximum, taking the command word with the maximum probability as an output result.
According to an embodiment of the present invention, the performing classification prediction on the extracted audio features through a deep learning network to obtain a classification prediction result includes: classifying the extracted audio features through a deep learning network supporting multiple languages to obtain the probability of command words and non-command words; and judging the category of the word with the highest probability in the obtained probabilities of the command word and the non-command word to obtain a classification prediction result.
According to an embodiment of the invention, the method further comprises: and if the probability that the classification prediction result is the non-command word is the maximum, judging that the voice signal does not contain the command word.
According to an embodiment of the present invention, extracting the audio feature of the speech signal includes: extracting Fbank characteristics of the voice signal; or, extracting MFCC features of the speech signal.
According to an embodiment of the present invention, the classifying and predicting the extracted audio features through a deep learning network includes: and performing classified prediction on the extracted audio features by means of a Recurrent Neural Network (RNN), a Convolutional Neural Network (CNN) or a Time Delay Neural Network (TDNN).
According to an embodiment of the present invention, when performing classification prediction on the extracted audio features by means of CNN, the method further includes: caching results of intermediate nodes in the previous N-round classification prediction process, wherein the value of N is a positive integer; correspondingly, the classification prediction of the extracted audio features by means of the CNN comprises the following steps: and (4) taking the results of the intermediate nodes in the classification prediction process of the previous N rounds in the cache and the audio features of the current round as the input of the CNN to perform classification prediction.
According to an embodiment of the invention, the method further comprises: and controlling the operation of executing the command word with the maximum probability.
according to a second aspect of the present invention, there is also provided a command word detecting apparatus supporting different languages, the apparatus including: the acquisition module is used for acquiring voice signals at least comprising two different languages; the characteristic extraction module is used for extracting the audio characteristic of the voice signal; the classification prediction module is used for performing classification prediction on the extracted audio features through a deep learning network to obtain a classification prediction result; and the output module is used for taking the command word with the maximum probability as an output result if the classification prediction result is that the probability of the command word is maximum.
according to an embodiment of the present invention, the classification prediction module includes: the classification unit is used for classifying the extracted audio features through a deep learning network supporting multiple languages to obtain the probability of command words and non-command words; and the judging unit is used for judging the category of the word with the highest probability in the obtained probabilities of the command word and the non-command word to obtain a classification prediction result.
According to an embodiment of the present invention, the output module is further configured to: and if the probability that the classification prediction result is the non-command word is the maximum, judging that the voice signal does not contain the command word.
According to an embodiment of the present invention, the feature extraction module is specifically configured to extract an Fbank feature of the voice signal; or, extracting MFCC features of the speech signal.
According to an embodiment of the present invention, the classification prediction module is specifically configured to perform classification prediction on the extracted audio features by using a recurrent neural network RNN, a convolutional neural network CNN, or a time-delay neural network TDNN.
According to an embodiment of the invention, the apparatus further comprises: the storage module is used for caching results of intermediate nodes in the previous N-round classification prediction process when the extracted audio features are classified and predicted by means of CNN, and the value of N is a positive integer; correspondingly, the classification prediction module is specifically configured to perform classification prediction by using the result of the intermediate node in the classification prediction process of the previous N rounds in the cache and the audio features of the current round as input of the CNN.
According to an embodiment of the invention, the apparatus further comprises: and the control execution module is used for controlling and executing the operation of the command word with the maximum probability.
According to an embodiment of the invention, the device is a smart headset or a microphone.
The embodiment of the invention provides a command word detection method and device supporting different languages, which comprises the steps of firstly collecting voice signals at least comprising two different languages; then extracting the audio features of the voice signals; further carrying out classification prediction on the extracted audio features through a deep learning network to obtain a classification prediction result; and if the classification prediction result is that the probability of the command word is maximum, taking the command word with the maximum probability as an output result. Therefore, the invention can classify and predict the audio characteristics of the extracted voice signal by constructing the deep learning network supporting multi-language and multi-command words, and directly obtain the probability of the command words and non-command words, thereby overcoming the problems that the existing command word detection method only supports single language and only supports the detection of single command words; moreover, storage resources and operation resources are saved to a great extent, and the expandability is good.
It is to be understood that the teachings of the present invention need not achieve all of the above-described benefits, but rather that specific embodiments may achieve specific technical results, and that other embodiments of the present invention may achieve benefits not mentioned above.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Fig. 1 is a schematic diagram showing a basic principle of a command word detection method in the related art;
FIG. 2 is a first flowchart illustrating an implementation of a command word detection method supporting different languages according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a basic principle of a command word detection method supporting different languages according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a specific implementation of a command word detection method for supporting different languages according to an exemplary application of the present invention;
Fig. 5 is a schematic diagram illustrating a configuration of a command word detection device supporting different languages according to an embodiment of the present invention.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given only to enable those skilled in the art to better understand and to implement the present invention, and do not limit the scope of the present invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The technical solution of the present invention is further elaborated below with reference to the drawings and the specific embodiments.
fig. 1 is a schematic diagram illustrating a basic principle of a command word detection method in the related art. Referring to fig. 1, the existing command word detection method mainly uses an input speech signal as an input of a deep learning network after feature extraction; the output of the deep learning network is the probability of each word in the command words, and after the probability of the input voice signal being the command word can be obtained through the post-processing module. When the probability is larger than a given threshold value, the system judges that the input voice signal contains a command word; otherwise, it is determined that no command word is detected. It is obvious that the current command word detection method supports only a single language and only the detection of a single command word.
in order to add support for multiple command words of different languages to the existing command word detection method, a simple and direct processing method is to run multiple models simultaneously, wherein each model is used to detect wake-up words of different languages. But this consumes more memory and computational resources and is poorly scalable.
therefore, in order to solve the problem that the conventional command word detection method only supports a single language and only supports the detection of a single command word, the present invention creatively provides a command word detection method supporting different languages as shown in fig. 2. Referring to fig. 2, the method for detecting command words supporting different languages according to the embodiment of the present invention includes: operation 201, collecting voice signals at least including two different languages; operation 202, extracting an audio feature of the speech signal; operation 203, performing classification prediction on the extracted audio features through a deep learning network to obtain a classification prediction result; in operation 204, if the classification prediction result is that the probability of the command word is the maximum, the command word with the maximum probability is taken as an output result.
in operation 201, a speech signal including at least two different languages is collected. The data format of the voice signal can be wav format, and 16-bit, 16000Hz sampling frequency and single channel are adopted. Of course, it should be understood by those skilled in the art that the foregoing specific parameters are only an exemplary speech signal, and the embodiments of the present invention do not limit the specific parameters of the speech signal.
In operation 202, feature extraction may be performed on the collected voice signal in units of frames. Specifically, FBank feature extraction may be performed on the acquired speech signal; or, MFCC feature extraction is performed on the acquired speech signal. Of course, the feature extraction method for the voice signal is not limited to Fbank and MFCC, but may be any other feature extraction method that meets the conditions, which is improved or created in the present or future.
In an application example, the feature extraction may also be performed on the collected speech signal frame by frame. For example, the feature extraction may be performed on the first frame, the second frame, and the third frame … in sequence according to the manner of acquiring the speech signal frames in real time. Therefore, the completeness of feature extraction can be ensured, the accuracy of subsequent operation is further ensured, and the accuracy of the command word detection method is finally ensured.
In yet another application example, feature extraction may also be performed on the acquired speech signal by frame skipping at a certain step size. For example, feature extraction may be performed from the first frame, the third frame, and the fifth frame … by frame skipping in sequence according to the manner of the speech signal frames acquired in real time. Therefore, because signal overlap exists between frames, the completeness of feature extraction can be ensured on the basis of saving the complexity of the whole command word detection method by reasonably selecting the step length setting of frame skipping, the accuracy of subsequent operation is further ensured, and the accuracy of the command word detection method can be finally ensured.
According to an embodiment of the present invention, in operation 203, classification prediction may be performed on the extracted audio features, typically by means of RNN, CNN or TDNN. Specifically, the most suitable network structure may be adopted according to the result of the deep learning network training. Wherein, when performing classification prediction on the extracted audio features by means of CNN, the method further comprises: caching results of intermediate nodes in the previous N-round classification prediction process, wherein the value of N is a positive integer; correspondingly, the classification prediction of the extracted audio features by means of the CNN comprises the following steps: and (4) taking the results of the intermediate nodes in the classification prediction process of the previous N rounds in the cache and the audio features of the current round as the input of the CNN to perform classification prediction.
In operation 203, the extracted audio features may be first classified through a deep learning network supporting multiple languages, to obtain probabilities including command words and non-command words; and then judging the category of the word with the highest probability in the obtained probabilities of the command word and the non-command word so as to obtain a classification prediction result.
Referring to the basic principle shown in fig. 3, the embodiment of the present invention can perform classification prediction on the audio features of the extracted speech signal by constructing a deep learning network supporting multiple language multiple command words, and directly obtain the probabilities of multiple command words, such as the probability of command word 1, the probability of command word 2 …, and the probability of non-command word.
According to an embodiment of the invention, the method further comprises: and if the probability that the classification prediction result is the non-command word is the maximum, judging that the voice signal does not contain the command word.
Referring to the application example shown in fig. 4, after obtaining the probabilities including the command word and the non-command word in operation 204, the category of the word with the highest probability in the obtained probabilities of the command word and the non-command word is determined, so as to obtain a classification prediction result, if the category of the word with the highest probability is the command word, the command word with the highest probability category is output, otherwise, the command word is not detected, that is, the voice signal does not include the command word.
after operation 204, the method further comprises, in accordance with an embodiment of the present invention: and controlling the operation of executing the command word with the maximum probability.
For example, if the command word with the highest output probability is to wake up the device, such as "Hey Siri" of an apple phone, or "favorite classmates" of a millet AI speaker, the device can directly control to execute the corresponding wake-up operation.
The embodiment of the invention provides a command word detection method and device supporting different languages, which comprises the steps of firstly collecting voice signals at least comprising two different languages; then extracting the audio features of the voice signals; further carrying out classification prediction on the extracted audio features through a deep learning network to obtain a classification prediction result; and if the classification prediction result is that the probability of the command word is maximum, taking the command word with the maximum probability as an output result. Therefore, the invention can classify and predict the audio characteristics of the extracted voice signal by constructing the deep learning network supporting multi-language and multi-command words, and directly obtain the probability of the command words and non-command words, thereby overcoming the problems that the existing command word detection method only supports single language and only supports the detection of single command words; moreover, storage resources and operation resources are saved to a great extent, and the expandability is good.
based on the above-mentioned command word detection method supporting different languages, an embodiment of the present invention further provides a device for detecting a command word supporting different languages, as shown in fig. 5, where the device 50 includes: the acquisition module 501 is configured to acquire voice signals at least including two different languages; a feature extraction module 502, configured to extract an audio feature of the speech signal; the classification prediction module 503 is configured to perform classification prediction on the extracted audio features through a deep learning network to obtain a classification prediction result; an output module 504, configured to take the command word with the largest probability as an output result if the classification prediction result is that the probability of the command word is the largest.
according to an embodiment of the present invention, the classification prediction module 503 includes: the classification unit is used for classifying the extracted audio features through a deep learning network supporting multiple languages to obtain the probability of command words and non-command words; and the judging unit is used for judging the category of the word with the highest probability in the obtained probabilities of the command word and the non-command word to obtain a classification prediction result.
According to an embodiment of the present invention, the output module 504 is further configured to: and if the probability that the classification prediction result is the non-command word is the maximum, judging that the voice signal does not contain the command word.
according to an embodiment of the present invention, the feature extraction module 502 is specifically configured to extract an Fbank feature of the voice signal; or, extracting MFCC features of the speech signal.
According to an embodiment of the present invention, the classification prediction module 503 is specifically configured to perform classification prediction on the extracted audio features by using a recurrent neural network RNN, a convolutional neural network CNN, or a time-delay neural network TDNN.
According to an embodiment of the invention, the apparatus 50 further comprises: the storage module is used for caching results of intermediate nodes in the previous N-round classification prediction process when the extracted audio features are classified and predicted by means of CNN, and the value of N is a positive integer; correspondingly, the classification prediction module 503 is specifically configured to perform classification prediction by using the result of the intermediate node in the classification prediction process of the previous N rounds in the cache and the audio feature of the current round as the input of the CNN.
according to an embodiment of the invention, the apparatus 50 further comprises: and the control execution module is used for controlling and executing the operation of the command word with the maximum probability.
The device 50 may be a smart headset or a microphone according to an embodiment of the present invention.
Also, based on the command word detection method supporting different languages as described above, an embodiment of the present invention further provides a computer-readable storage medium storing a program, which, when executed by a processor, causes the processor to perform at least the following operation steps: operation 201, collecting voice signals at least including two different languages; operation 202, extracting an audio feature of the speech signal; operation 203, performing classification prediction on the extracted audio features through a deep learning network to obtain a classification prediction result; in operation 204, if the classification prediction result is that the probability of the command word is the maximum, the command word with the maximum probability is taken as an output result.
Here, it should be noted that: the above description of the embodiments of the command word detecting device and the computer storage medium supporting different languages is similar to the description of the method embodiments shown in fig. 2 to 4, and has similar beneficial effects to the method embodiments shown in fig. 2 to 4, and therefore, the description thereof is omitted. For technical details that are not disclosed in the embodiment of the apparatus of the present invention, please refer to the description of the method embodiment shown in fig. 2 to 4 of the present invention, which is for brevity and will not be described again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

CN201910932340.0A2019-09-292019-09-29command word detection method and device supporting different languagesPendingCN110580908A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910932340.0ACN110580908A (en)2019-09-292019-09-29command word detection method and device supporting different languages

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910932340.0ACN110580908A (en)2019-09-292019-09-29command word detection method and device supporting different languages

Publications (1)

Publication NumberPublication Date
CN110580908Atrue CN110580908A (en)2019-12-17

Family

ID=68813983

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910932340.0APendingCN110580908A (en)2019-09-292019-09-29command word detection method and device supporting different languages

Country Status (1)

CountryLink
CN (1)CN110580908A (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102543072A (en)*2010-12-072012-07-04王诚本System and method for real-time detection of fatigue
CN103077714A (en)*2013-01-292013-05-01华为终端有限公司Information identification method and apparatus
CN103400577A (en)*2013-08-012013-11-20百度在线网络技术(北京)有限公司Acoustic model building method and device for multi-language voice identification
CN103559289A (en)*2013-11-082014-02-05安徽科大讯飞信息科技股份有限公司Language-irrelevant keyword search method and system
CN103839545A (en)*2012-11-232014-06-04三星电子株式会社Apparatus and method for constructing multilingual acoustic model
CN104103272A (en)*2014-07-152014-10-15无锡中星微电子有限公司Voice recognition method and device and blue-tooth earphone
CN105139864A (en)*2015-08-172015-12-09北京天诚盛业科技有限公司Voice recognition method and voice recognition device
CN105229725A (en)*2013-03-112016-01-06微软技术许可有限责任公司Multilingual dark neural network
CN105741838A (en)*2016-01-202016-07-06百度在线网络技术(北京)有限公司Voice wakeup method and voice wakeup device
WO2016110068A1 (en)*2015-01-072016-07-14中兴通讯股份有限公司Voice switching method and apparatus for voice recognition device
CN106847281A (en)*2017-02-262017-06-13上海新柏石智能科技股份有限公司Intelligent household voice control system and method based on voice fuzzy identification technology
CN107767863A (en)*2016-08-222018-03-06科大讯飞股份有限公司voice awakening method, system and intelligent terminal
CN108510976A (en)*2017-02-242018-09-07芋头科技(杭州)有限公司A kind of multilingual mixing voice recognition methods
CN109065043A (en)*2018-08-212018-12-21广州市保伦电子有限公司A kind of order word recognition method and computer storage medium
CN109065020A (en)*2018-07-282018-12-21重庆柚瓣家科技有限公司The identification storehouse matching method and system of multilingual classification

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102543072A (en)*2010-12-072012-07-04王诚本System and method for real-time detection of fatigue
CN103839545A (en)*2012-11-232014-06-04三星电子株式会社Apparatus and method for constructing multilingual acoustic model
CN103077714A (en)*2013-01-292013-05-01华为终端有限公司Information identification method and apparatus
CN105229725A (en)*2013-03-112016-01-06微软技术许可有限责任公司Multilingual dark neural network
CN103400577A (en)*2013-08-012013-11-20百度在线网络技术(北京)有限公司Acoustic model building method and device for multi-language voice identification
CN103559289A (en)*2013-11-082014-02-05安徽科大讯飞信息科技股份有限公司Language-irrelevant keyword search method and system
CN104103272A (en)*2014-07-152014-10-15无锡中星微电子有限公司Voice recognition method and device and blue-tooth earphone
WO2016110068A1 (en)*2015-01-072016-07-14中兴通讯股份有限公司Voice switching method and apparatus for voice recognition device
CN105139864A (en)*2015-08-172015-12-09北京天诚盛业科技有限公司Voice recognition method and voice recognition device
CN105741838A (en)*2016-01-202016-07-06百度在线网络技术(北京)有限公司Voice wakeup method and voice wakeup device
CN107767863A (en)*2016-08-222018-03-06科大讯飞股份有限公司voice awakening method, system and intelligent terminal
CN108510976A (en)*2017-02-242018-09-07芋头科技(杭州)有限公司A kind of multilingual mixing voice recognition methods
CN106847281A (en)*2017-02-262017-06-13上海新柏石智能科技股份有限公司Intelligent household voice control system and method based on voice fuzzy identification technology
CN109065020A (en)*2018-07-282018-12-21重庆柚瓣家科技有限公司The identification storehouse matching method and system of multilingual classification
CN109065043A (en)*2018-08-212018-12-21广州市保伦电子有限公司A kind of order word recognition method and computer storage medium

Similar Documents

PublicationPublication DateTitle
EP3611663B1 (en)Image recognition method, terminal and storage medium
CN110570840B (en)Intelligent device awakening method and device based on artificial intelligence
CN111880856B (en) Voice wake-up method, device, electronic equipment and storage medium
CN112825248B (en)Voice processing method, model training method, interface display method and equipment
CN111192590B (en) Voice wake-up method, device, device and storage medium
CN113035231B (en) Keyword detection method and device
CN111128134A (en)Acoustic model training method, voice awakening method, device and electronic equipment
KR102688236B1 (en)Voice synthesizer using artificial intelligence, operating method of voice synthesizer and computer readable recording medium
US11030994B2 (en)Selective activation of smaller resource footprint automatic speech recognition engines by predicting a domain topic based on a time since a previous communication
CN111105786B (en)Multi-sampling-rate voice recognition method, device, system and storage medium
CN114627863A (en)Speech recognition method and device based on artificial intelligence
CN115798459B (en)Audio processing method and device, storage medium and electronic equipment
CN114360510B (en) A speech recognition method and related device
CN111312233A (en)Voice data identification method, device and system
CN112037772A (en)Multi-mode-based response obligation detection method, system and device
CN113838462A (en)Voice wake-up method and device, electronic equipment and computer readable storage medium
CN111326146A (en)Method and device for acquiring voice awakening template, electronic equipment and computer readable storage medium
CN113782014A (en)Voice recognition method and device
WO2024093578A1 (en)Voice recognition method and apparatus, and electronic device, storage medium and computer program product
CN112948763B (en)Piece quantity prediction method and device, electronic equipment and storage medium
CN113225624B (en)Method and device for determining time consumption of voice recognition
KR102642617B1 (en)Voice synthesizer using artificial intelligence, operating method of voice synthesizer and computer readable recording medium
US20200193981A1 (en)Personalized phrase spotting during automatic speech recognition
CN110580908A (en)command word detection method and device supporting different languages
CN110556099B (en)Command word control method and device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20191217

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp