Movatterモバイル変換


[0]ホーム

URL:


CN110602624B - Audio test method, device, storage medium and electronic equipment - Google Patents

Audio test method, device, storage medium and electronic equipment
Download PDF

Info

Publication number
CN110602624B
CN110602624BCN201910818540.3ACN201910818540ACN110602624BCN 110602624 BCN110602624 BCN 110602624BCN 201910818540 ACN201910818540 ACN 201910818540ACN 110602624 BCN110602624 BCN 110602624B
Authority
CN
China
Prior art keywords
preset
processor
verified
audio
audio signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910818540.3A
Other languages
Chinese (zh)
Other versions
CN110602624A (en
Inventor
陈喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp LtdfiledCriticalGuangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN201910818540.3ApriorityCriticalpatent/CN110602624B/en
Publication of CN110602624ApublicationCriticalpatent/CN110602624A/en
Application grantedgrantedCritical
Publication of CN110602624BpublicationCriticalpatent/CN110602624B/en
Expired - Fee Relatedlegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本申请实施例公开了一种音频测试方法、装置、存储介质及电子设备,其中,电子设备通过麦克风循环采集预设次数待校验音频信号用于其中专用语音识别芯片的一级校验和处理器的二级校验,并利用预设计数应用接收专用语音识别芯片在一级校验通过时发送的第一指示信息,实现对专用语音识别芯片校验成功次数的统计,得到第一计数结果,以及利用预设计数应用接收处理器在二级校验通过时发送的第二指示信息,实现对处理器校验成功次数的统计,得到第二计数结果。最后,根据第一计数结果和预设次数统计得到专用语音识别芯片的第一唤醒率,根据第一计数结果和第二计数结果统计得到处理器的第二唤醒率,实现对电子设备唤醒率的高效测试。

Figure 201910818540

Embodiments of the present application disclose an audio testing method, device, storage medium, and electronic device, wherein the electronic device cyclically collects a preset number of audio signals to be verified through a microphone, which is used for the first-level verification and processing of a dedicated speech recognition chip. It uses the preset counting application to receive the first indication information sent by the dedicated speech recognition chip when the first-level verification is passed, realizes statistics on the number of successful verifications of the dedicated speech recognition chip, and obtains the first count result , and use the preset count application to receive the second indication information sent by the processor when the second-level verification is passed, to implement statistics on the number of successful processor verifications, and obtain a second count result. Finally, the first wake-up rate of the dedicated speech recognition chip is obtained according to the first counting result and the preset number of times, and the second wake-up rate of the processor is obtained according to the first counting result and the second counting result, so as to realize the control of the wake-up rate of the electronic device. Efficient testing.

Figure 201910818540

Description

Audio testing method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of audio testing, and in particular, to an audio testing method and apparatus, a storage medium, and an electronic device.
Background
Speech recognition is an important way for electronic devices such as smart phones and tablet computers to obtain the intention of users, and at present, a speech recognition function has become a standard configuration function of numerous electronic devices, for example, a user can speak a speech instruction to control the electronic device under the condition that the user is inconvenient to directly control the electronic device.
It should be noted that the voice recognition can be divided into two procedures of waking up and recognizing, and when the electronic device is woken up, the electronic device can be controlled by voice, which also makes the wake-up rate an important performance index of the electronic device, and how to obtain the wake-up rate of the electronic device through efficient testing becomes more important.
Disclosure of Invention
The embodiment of the application provides an audio test method, an audio test device, a storage medium and electronic equipment, which can efficiently test the awakening rate of the electronic equipment.
In a first aspect, an embodiment of the present application provides an audio testing method, which is applied to an electronic device, where the electronic device includes a microphone, a dedicated voice recognition chip and a processor, and the electronic device is placed in a pre-established testing environment, a voice playing device for playing testing voice is provided in the testing environment, the testing voice is a pure voice signal including a preset wake-up word, and the audio testing method includes:
acquiring audio through the microphone to obtain an audio signal to be verified, and providing the audio signal to be verified to the special voice recognition chip;
performing primary verification on the audio signal to be verified through the special voice recognition chip, providing the audio signal to be verified to the processor when the verification is passed, sending first indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a first counting result corresponding to the special voice recognition chip;
performing secondary verification on the audio signal to be verified through the processor, sending second indication information to the preset counting application when the verification is passed, and indicating the preset counting application to count so as to obtain a second counting result corresponding to the processor;
judging whether the number of times of primary verification reaches a preset number, if not, acquiring the audio signal to be verified again through the microphone for verification, and if so, acquiring the first counting result and the second counting result;
and counting according to the first counting result and the preset times to obtain a first awakening rate of the special voice recognition chip, and counting according to the first counting result and the second counting result to obtain a second awakening rate of the processor.
In a second aspect, an embodiment of the present application provides an audio test apparatus, which is applied to an electronic device, the electronic device includes a microphone, a dedicated voice recognition chip and a processor, and the electronic device is placed in a pre-built test environment, a voice playing device for playing test voice is provided in the test environment, the test voice is a pure voice signal including a preset wake-up word, and the audio test apparatus includes:
the audio acquisition module is used for acquiring audio through the microphone to obtain an audio signal to be verified and providing the audio signal to be verified to the special voice recognition chip;
the primary checking module is used for performing primary checking on the audio signal to be checked through the special voice recognition chip, providing the audio signal to be checked to the processor when the checking is passed, sending first indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a first counting result corresponding to the special voice recognition chip;
the secondary checking module is used for carrying out secondary checking on the audio signal to be checked through the processor, sending second indication information to the preset counting application when the checking is passed, and indicating the preset counting application to count so as to obtain a second counting result corresponding to the processor;
the result acquisition module is used for judging whether the number of times of primary verification reaches a preset number, otherwise, the microphone collects the audio signal to be verified again for verification, and if so, the first counting result and the second counting result are acquired;
and the counting and counting module is used for counting according to the first counting result and the preset times to obtain a first awakening rate of the special voice recognition chip and counting according to the first counting result and the second counting result to obtain a second awakening rate of the processor.
In a third aspect, an embodiment of the present application provides a storage medium, on which a computer program is stored, where the computer program is loaded by a processor and a dedicated speech recognition chip to execute an audio testing method provided by the embodiment of the present application.
In a fourth aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes a microphone, a dedicated speech recognition chip, a processor, and a memory, where the memory stores a computer program, and the computer program is used to execute the audio test method provided in the embodiment of the present application when the computer program is called by the dedicated speech recognition chip and the processor.
In the embodiment of the application, the microphone circularly collects the preset times of audio signals to be verified, the audio signals are used for primary verification of the special voice recognition chip and secondary verification of the processor, the preset counting application is utilized to receive first indication information sent by the special voice recognition chip when the primary verification passes, statistics of the successful times of verification of the special voice recognition chip is achieved, a first counting result is obtained, the preset counting application is utilized to receive second indication information sent by the processor when the secondary verification passes, statistics of the successful times of verification of the processor is achieved, and a second counting result is obtained. And finally, counting according to the first counting result and the preset times to obtain a first awakening rate of the special voice recognition chip, counting according to the first counting result and the second counting result to obtain a second awakening rate of the processor, and realizing high-efficiency test of the awakening rate of the electronic equipment.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of an audio testing method according to an embodiment of the present application.
Fig. 2 is a schematic diagram of invoking a primary text verification model in the embodiment of the present application.
Fig. 3 is a schematic diagram of a test environment built in the embodiment of the present application.
FIG. 4 is another schematic flow chart of an audio testing method according to an embodiment of the present application
Fig. 5 is a schematic structural diagram of an audio testing apparatus according to an embodiment of the present application.
Fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Fig. 7 is another schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is by way of example of particular embodiments of the present application and should not be construed as limiting the other particular embodiments of the present application that are not detailed herein.
Embodiments of the present application provide an audio testing method, an audio testing apparatus, a storage medium, and an electronic device, wherein an executing subject of the audio testing method may be the audio testing apparatus provided in the embodiments of the present application, or the electronic device integrated with the audio testing apparatus, wherein the audio testing apparatus may be implemented in hardware or software, and the electronic device may be a computing device such as a laptop computer, a computer monitor including an embedded computer, a tablet computer, a cellular phone, a media player, or other handheld or portable electronic devices, a smaller device (such as a wristwatch device, a hanging device, an earphone or headphone device, a device embedded in glasses or other devices worn on a user's head, or other wearable or miniature devices), a television, a computer display not including an embedded computer, a computer display, and a computer display, Gaming devices, navigation devices, embedded systems (such as systems in which an electronic device having a display is installed in a kiosk or automobile), and the like.
As shown in fig. 1, the flow of the audio testing method provided by the embodiment of the present application may be as follows:
101, acquiring audio through a microphone to obtain an audio signal to be verified, and providing the audio signal to be verified to a special voice recognition chip.
In the embodiment of the application, a test environment for audio test is set up in advance. For example, in order to get rid of external interference, can set up syllable-dividing test environment, wherein, be provided with the pronunciation playback devices who is used for broadcasting test pronunciation in the test environment, test pronunciation be for including the pure speech signal of predetermineeing the word of awakening up, for example, pronunciation playback devices can be artificial head, and it uses 5 seconds as the interval, and the circulation broadcast test pronunciation. It should be noted that the preset wake-up word may be set by a person skilled in the art according to actual needs, which is not specifically limited in the embodiment of the present application, for example, the preset wake-up word may be set to "small europe".
Before the audio test is started, the electronic equipment for audio test is placed in a test environment, so that the test voice is played through the voice playing equipment to simulate a real use scene to carry out the audio test on the electronic equipment, and the awakening rate of the electronic equipment is determined.
It should be noted that the electronic device in the embodiment of the present application includes a microphone, a dedicated voice recognition chip and a processor, wherein the dedicated voice recognition chip is a dedicated chip designed for voice recognition, such as a digital signal processing chip designed for voice recognition, an application specific integrated circuit chip designed for voice recognition, etc., which has lower power consumption but relatively weaker processing capability than a general-purpose processor. Because the processing capacity of the special voice recognition chip is not as good as that of the processor, when voice awakening is carried out, the special voice recognition chip carries out primary verification, namely rough verification, on the collected audio signals, when the primary verification passes, the processor carries out secondary verification on the collected audio signals, the accuracy of the whole verification is ensured, and when the secondary verification passes, voice interaction application is awakened again, so that voice interaction with a user is realized. Among other things, voice interactive applications are also known as voice assistants, such as "small europe" and the like.
When the audio test is carried out, the electronic equipment carries out audio acquisition through the arranged microphone, so that an audio signal corresponding to the test voice is acquired, and the audio signal is recorded as an audio signal to be verified.
The microphone provided in the electronic device may be an internal microphone or an external microphone (which may be a wired microphone or a wireless microphone). If the microphone is a microphone of an analog system, the acquired audio signal to be verified of the analog system needs to be subjected to analog-to-digital conversion at the moment, so that a digitized audio signal to be verified is obtained for subsequent processing. It can be understood by those skilled in the art that if the microphone disposed in the electronic device is a digital microphone, the digitized audio signal to be verified is directly collected without analog-to-digital conversion.
After the audio signal to be verified is acquired, the electronic equipment provides the acquired audio signal to be verified for the special voice recognition chip.
102, performing primary verification on the audio signal to be verified through the special voice recognition chip, providing the audio signal to be verified to the processor when the verification is passed, sending first indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a first counting result corresponding to the special voice recognition chip.
The first-level verification of the audio signal to be verified includes verifying a text feature of the audio signal to be verified, or verifying a text feature and a voiceprint feature of the audio signal to be verified, which may be specifically set by a person of ordinary skill in the art according to an actual situation, for example, in the embodiment of the present application, only the special voice recognition chip is used to verify the text feature of the audio signal to be verified.
In popular terms, the verification text characteristic is to verify whether the audio signal to be verified includes the preset awakening word, and as long as the audio signal to be verified includes the preset awakening word, the text characteristic passes the verification no matter who says the preset awakening word.
When the audio signal to be verified is subjected to primary verification, the special voice recognition chip can load a pre-trained primary awakening model used for verifying whether the audio signal comprises the preset awakening words or not, and primary verification is performed on the audio signal to be verified through the primary awakening model.
It should be noted that, in the embodiment of the present application, a message sending mechanism is further added to the dedicated voice recognition chip, so that the dedicated voice recognition chip sends the first indication information to the operating system of the electronic device when the primary verification of the audio signal to be verified is passed.
For example, the following description will be given by taking an example in which the electronic device runs the android system.
And when the primary verification of the audio signal to be verified is passed, the special voice recognition chip sends first indication information to an android system of the electronic equipment.
On the other hand, the embodiment of the present application is also designed with a counting application in advance, and can be obtained by selecting a suitable programming language for programming by a person of ordinary skill in the art according to actual needs. In order to know whether the primary verification of the audio signal to be verified by the special voice recognition chip passes or not, the preset counting application registers the first indication information in the android system in advance, so that the android system can push the first indication information to the preset counting application.
The preset counting application counts according to the first indication information when receiving the first indication information to obtain a first counting result corresponding to the dedicated voice recognition chip, for example, the preset counting application creates a first counting value which corresponds to the dedicated voice recognition chip and has an initial value of zero, and adds one to the first counting value each time the first indication information is received, that is, each time the primary verification of the acquired audio signal to be verified by the dedicated voice recognition chip passes, thereby realizing the statistics of the verification success times of the dedicated voice recognition chip and obtaining the first counting result.
In addition, when the primary verification of the audio signal to be verified is passed, the special voice recognition chip provides the audio signal to be verified, which is acquired by the microphone at this time, to the processor.
In addition, it should be noted that, due to the reason of the acquired audio signal to be verified and/or the reason of the primary wake-up model, the acquired audio signal to be verified cannot pass the primary verification, at this time, the first indication information is not sent, the acquired audio signal to be verified is discarded, and the process proceeds to 104.
And 103, performing secondary verification on the audio signal to be verified through the processor, sending second indication information to the preset counting application when the verification is passed, and indicating the preset counting application to count so as to obtain a second counting result corresponding to the processor.
The second-level verification of the audio signal to be verified includes verifying a text feature of the audio signal to be verified, or verifying a text feature and a voiceprint feature of the audio signal to be verified, which may be specifically set by a person of ordinary skill in the art according to an actual situation, for example, in the embodiment of the present application, the processor verifies the text feature and the voiceprint feature of the audio signal to be verified.
For example, when performing secondary verification on the audio signal to be verified, the processor may load a pre-trained secondary wake-up model for verifying whether the audio signal includes the preset wake-up word and whether the voiceprint feature matches the preset voiceprint feature, and perform secondary verification on the audio signal to be verified through the secondary wake-up model.
And when the secondary verification of the audio signal to be verified passes, the processor sends second indication information to an android system of the electronic equipment.
Correspondingly, in order to know whether the secondary verification of the audio signal to be verified by the processor passes or not, the preset counting application registers the second indication information in the android system in advance, so that the android system can push the second indication information to the preset counting application. Therefore, when the second indication information is received, counting is carried out according to the second indication information to obtain a second counting result of the corresponding processor, for example, a second counting value which corresponds to the processor and has an initial value of zero is established by preset counting application, and when the second indication information is received each time, namely, each time the secondary verification of the collected audio signal to be verified by the processor passes, the second counting value is added by one, so that the statistics of the verification success times of the processor is realized, and the second counting result is obtained.
In addition, it should be noted that, due to the reason of the acquired audio signal to be verified and/or the reason of the secondary wake-up model, the acquired audio signal to be verified cannot pass the secondary verification, at this time, the second indication information is not sent, the acquired audio signal to be verified is discarded, and the process proceeds to 104.
And 104, judging whether the number of times of primary verification reaches a preset number, otherwise, acquiring the audio signal to be verified again through the microphone for verification, and if so, acquiring a first counting result and a second counting result.
It should be noted that, the electronic device also counts the number of times of performing the primary verification, and after the secondary verification of the audio signal to be verified is completed through the processor each time, determines whether the number of times of performing the primary verification reaches a preset number of times. The preset number can be set by one skilled in the art according to actual needs, for example, the preset number can be set to 100.
And when the number of times of the primary verification reaches a preset number, the electronic equipment acquires a first counting result and a second counting result from a preset counting application.
When the number of times of the primary verification does not reach the preset number of times, the electronic equipment collects the audio signals to be verified again through the microphone for verification, and when the number of times of the primary verification reaches the preset number of times, a first counting result and a second counting result are obtained from a preset counting application.
And 105, counting according to the first counting result and the preset times to obtain a first awakening rate of the special voice recognition chip, and counting according to the first counting result and the second counting result to obtain a second awakening rate of the processor.
For example, the preset number of times may be set to 100, that is, 100 primary checks need to be performed by the dedicated voice recognition chip, and assuming that the first counting result is 88, that is, 88 passes of the 100 primary checks are performed in total, it may be counted that the first wake-up rate of the dedicated voice recognition chip is 99/100 ═ 99%; assuming that the second count result is 98, as described above, since the second verification is not performed when the first verification fails, and the second verification is performed only when the first verification passes, that is, 98 passes are performed in total among 99 passes of the second verifications, the second wake-up rate of the processor can be statistically obtained as 98/99-99%. In addition, the overall wake-up rate of the dedicated voice recognition chip and the processor can be counted according to the second counting result and the preset number of times, and is recorded as a third wake-up rate which is 98/100-98%.
Therefore, the embodiment of the application collects the audio signals to be verified in the preset times through the microphone in a circulating mode and is used for primary verification of the special voice recognition chip and secondary verification of the processor, the first indication information sent by the special voice recognition chip when the primary verification is passed is received through the preset counting application, statistics of successful verification times of the special voice recognition chip is achieved, the first counting result is obtained, the second indication information sent by the processor when the secondary verification is passed is received through the preset counting application, statistics of successful verification times of the processor is achieved, and the second counting result is obtained. And finally, counting according to the first counting result and the preset times to obtain a first awakening rate of the special voice recognition chip, counting according to the first counting result and the second counting result to obtain a second awakening rate of the processor, and realizing high-efficiency test of the awakening rate of the electronic equipment.
In an embodiment, the second indication information includes second text indication information and second fingerprint indication information, the second counting result includes a second text counting result and a second fingerprint counting result, the second wake-up rate includes a second text wake-up rate and a second fingerprint wake-up rate, "perform, by the processor, a secondary check on the audio signal to be checked," includes:
(1) calling a pre-trained secondary text verification model corresponding to a preset awakening word through a processor, verifying whether the audio signal to be verified comprises the preset awakening word, if the verification is passed, sending second text indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a second text counting result corresponding to the processor;
(2) calling a pre-trained secondary voiceprint check model corresponding to the test voice through a processor, checking whether the voiceprint features of the audio signal to be checked are matched with the voiceprint features of the test voice, if the check is passed, sending second voiceprint indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a second voiceprint counting result corresponding to the processor;
obtaining a second wake-up rate of the processor according to the first counting result and the second counting result, including:
(3) and counting according to the second text counting result and the first counting result to obtain a second text awakening rate, and counting according to the second fingerprint counting result and the second text counting result to obtain a second fingerprint awakening rate.
In the embodiment of the present application, the secondary verification performed by the processor is described as an example, where the secondary verification includes verification of a text feature and a voiceprint feature.
When the audio signal to be verified is subjected to secondary verification through the processor, firstly, a pre-trained secondary text verification model corresponding to the preset awakening words is called through the processor, and whether the audio signal to be verified comprises the preset awakening words is verified through the secondary text verification model.
For example, the secondary text verification model may be trained by a scoring function, where the scoring function is used to map a vector to a numerical value, and this is used as a constraint, and a person skilled in the art may select a suitable function as the scoring function according to actual needs, which is not limited in this embodiment of the present invention.
When the secondary text verification model is used for verifying whether the audio signal to be verified comprises the preset awakening words, firstly, the feature vector capable of representing the audio signal to be verified is extracted, the feature vector is input into the secondary text verification model for grading, and the corresponding grading score is obtained. And then, comparing the score with a discrimination score corresponding to the secondary text verification model, and if the score reaches the discrimination score corresponding to the secondary text verification model, judging that the audio signal to be verified comprises a preset awakening word.
And when the audio signal to be verified comprises the preset awakening word, the processor sends second text indication information to the android system. Correspondingly, in order to know whether the processor checks the text features of the audio signal to be checked, the preset counting application registers the second text indication information in the android system in advance, so that the android system can push the second text indication information to the preset counting application. And when receiving the second text indication information, counting according to the second text indication information to obtain a second text counting result corresponding to the processor, for example, a second text counting value which is created by a preset counting application and has an initial value of zero corresponding to the processor, and adding one to the second text counting value every time the second text indication information is received, that is, every time the processor passes the text feature verification of the acquired audio signal to be verified, thereby realizing the statistics of the number of times of success of the text feature verification of the processor and obtaining the second text counting result.
In addition, when the text features of the audio signal to be verified pass the verification, the electronic equipment calls a pre-trained secondary voiceprint verification model corresponding to the test voice through the processor, and verifies whether the voiceprint features of the audio signal to be verified are matched with the voiceprint features of the test voice through the secondary voiceprint verification model.
For example, the secondary voiceprint verification model can be further trained by the secondary text verification model based on the test speech. When the secondary voiceprint check model is used for checking whether the voiceprint characteristics of the audio signal to be checked are matched with the voiceprint characteristics of the test voice, firstly, the characteristic vector capable of representing the audio signal to be checked is extracted, and the characteristic vector is input into the secondary voiceprint check model to be scored, so that the corresponding scoring score is obtained. And then, comparing the score value with a discrimination score corresponding to the secondary voiceprint verification model, and if the score value reaches the discrimination score corresponding to the secondary voiceprint verification model, judging that the voiceprint characteristics of the audio signal to be verified are matched with the voiceprint characteristics of the test voice.
And when the voiceprint feature of the audio signal to be verified is matched with the voiceprint feature of the test voice, the processor sends second voiceprint indication information to the android system. Correspondingly, in order to know whether the processor checks the voiceprint features of the audio signal to be checked, the preset counting application registers second voiceprint indication information in the android system in advance, so that the android system can push the second voiceprint indication information to the preset counting application. Therefore, when receiving the second voiceprint indication information, counting is carried out according to the second voiceprint indication information so as to obtain a second voiceprint counting result of the corresponding processor, for example, a second voiceprint counting value which corresponds to the processor and is provided with an initial value of zero is established by preset counting application, and when receiving the second voiceprint indication information every time, namely, when the processor passes the voiceprint feature check of the collected audio signal to be checked, the second voiceprint counting value is increased by one, so that the statistics of the number of times of success of the voiceprint feature check of the processor is realized, and the second voiceprint counting result is obtained.
Furthermore, if the text or voiceprint characteristics of the audio signal to be verified have not been verified, a transition is made to 104.
In an embodiment, before "obtaining the audio signal to be verified by performing audio acquisition through a microphone", the method further includes:
(1) acquiring a pre-trained general verification model corresponding to a preset awakening word, and setting the general verification model as a secondary text verification model;
(2) carrying out audio acquisition through a microphone to obtain a sample audio signal;
(3) and extracting acoustic features of the sample audio signal, carrying out self-adaptive processing on the acoustic features based on the general verification model, and setting the general verification model after the self-adaptive processing as a secondary voiceprint verification model.
For example, before starting the audio test, sample audio signals of a preset wake-up word spoken by a plurality of people (e.g., 200 people) may be collected in advance, then acoustic features (e.g., mel-frequency cepstrum coefficients) of the sample audio signals are respectively extracted, and then a general verification model corresponding to the preset wake-up word is obtained through training according to the acoustic features of the sample audio signals. Since the universal verification model is trained by a large number of audio signals irrelevant to a specific person (i.e., a user), the universal verification model only fits the distribution of acoustic features of the person and does not represent a specific person.
In the embodiment of the application, before the audio test is started, a pre-trained general verification model corresponding to the preset awakening word is obtained, and the general verification model is set as a secondary text verification model.
In addition, the electronic equipment also acquires audio through the arranged microphone, so that an audio signal corresponding to the test voice is acquired, and the audio signal is recorded as a sample audio signal. And then, the electronic equipment extracts the acoustic characteristics of the sample audio signal, performs self-adaptive processing on the acoustic characteristics based on the general verification model, and sets the general verification model after the self-adaptive processing as a secondary voiceprint verification model. Wherein, the self-adapting process can be realized by adopting a maximum posterior estimation algorithm.
In an embodiment, a noise playing device is further disposed in the test scene, and the noise playing device is configured to play sample noise of a preset scene.
In the embodiment of the application, the noise playing device is further arranged and used for playing the sample noise of the preset scene, so that the awakening rate of the electronic device under the preset scene is tested. For example, the wake-up rate of the electronic device in the subway scene can be tested by playing the sample noise of the subway scene through the noise device.
In an embodiment, before "obtaining the audio signal to be verified by performing audio acquisition through a microphone", the method further includes:
(1) acquiring a first decibel value of the voice playing device playing the test voice and a second decibel value of the noise playing device playing the sample noise;
(2) and when the first decibel value and the second decibel value meet the preset test condition, audio acquisition is carried out through the microphone to obtain an audio signal to be verified.
In the embodiment of the present application, in order to ensure normal operation of the audio test, a certain signal-to-noise ratio during the test needs to be ensured.
For example, before the test is started, the decibel meter is placed at the same position of the electronic device, a first decibel value of the voice playing device playing the test voice and a second decibel value of the noise playing device playing the sample noise are obtained through the decibel meter, and then the ratio of the first decibel value to the second decibel value is calculated and used as the signal-to-noise ratio of the test environment.
Correspondingly, the preset test condition can be set to the state that the signal-to-noise ratio of the test environment reaches the preset signal-to-noise ratio, and for the value of the preset signal-to-noise ratio, a person skilled in the art can take the value according to actual needs.
After the electronic equipment calculates the signal-to-noise ratio of the test environment according to the first decibel value and the second decibel value, whether the signal-to-noise ratio reaches a preset signal-to-noise ratio is judged, if yes, audio acquisition is carried out through a microphone to obtain an audio signal to be verified, and audio test is started.
In one embodiment, "performing a primary verification on an audio signal to be verified through a dedicated voice recognition chip" includes:
(1) calling a pre-trained scene classification model through a special voice recognition chip to perform scene classification on the audio signal to be verified to obtain a scene classification result;
(2) and calling a pre-trained primary text verification model corresponding to the scene classification result through a special voice recognition chip to verify whether the audio signal to be verified comprises a preset awakening word.
In the embodiment of the present application, the first-level verification performed by the dedicated speech recognition chip including the verification of the text feature is taken as an example for explanation.
It should be noted that, in the embodiment of the present application, a scene classification model is trained in advance by using a machine learning algorithm according to sample audio signals of different known scenes, and the scene classification model can be used to classify the scene where the electronic device is located.
Because the test environment is provided with not only the voice playing device but also the noise playing device, the audio signal to be verified collected by the electronic device can be regarded as being composed of two parts, namely a part corresponding to the test voice and a part corresponding to the sample noise. Correspondingly, when the audio signal to be verified is subjected to primary verification through the special voice recognition chip, a pre-trained scene classification model is called through the special voice recognition chip, and the audio signal to be verified is classified by utilizing the scene classification model to obtain a scene classification result. The scene classification result describes the scene simulated by the noise playing device through playing the sample noise.
It should be noted that, in the embodiment of the present application, a primary text verification model set is preset in the electronic device, where the primary text verification model set includes a plurality of primary text verification models which are obtained by training in different scenes in advance and correspond to the preset wake-up words, so as to be suitable for the special voice recognition chip to load in different scenes, and thus, whether the acquired audio signal to be verified includes the preset wake-up words is verified more flexibly and accurately.
Correspondingly, after the scene classification result corresponding to the audio signal to be verified is obtained, the electronic device calls a primary text verification model corresponding to the scene classification result from the primary text verification model set through the special voice recognition chip, verifies whether the audio signal to be verified includes the preset awakening word through the primary text verification model, and if yes, judges that the audio signal to be verified passes the primary verification.
For example, referring to fig. 2, the primary text verification model set includes four primary text verification models, which are a primary text verification model a suitable for performing audio verification in a scene a, a primary text verification model B suitable for performing audio verification in a scene B, a primary text verification model C suitable for performing audio verification in a scene C, and a primary text verification model D suitable for performing audio verification in a scene D. If the scene classification result indicates that the scene corresponding to the audio signal to be verified is a scene B, the electronic equipment loads a primary text verification model B from a primary text verification model set through a special voice recognition chip; and if the scene classification result indicates that the scene corresponding to the audio signal to be verified is a scene B, the electronic equipment loads a primary text verification model B from the primary text verification model set through the special voice recognition chip, and the like.
Referring to fig. 3 and 4 in combination, fig. 3 is a schematic diagram of a test environment for performing an audio test in the embodiment of the present application, and as shown in fig. 3, a sound-proof test environment is first established, and a dummy head is set in the test environment as a voice playing device for playing test voice, and a speaker is set as a noise playing device for playing sample noise, and in addition, a computer is also set in the test environment as a main control device for performing play control on the dummy head and the speaker. The person skilled in the art determines the placement position of the electronic device in the test environment according to actual needs, and places the electronic device at the determined placement position.
The electronic device comprises a special voice recognition chip and a processor. When voice awakening is carried out, first-level verification is carried out on the collected audio signals by the special voice recognition chip, namely rough verification is carried out, when the first-level verification is passed, second-level verification is carried out on the collected audio signals by the processor, the accuracy of the whole verification is ensured, and when the second-level verification is passed, voice interaction application is awakened again, so that voice interaction with a user is realized. Among them, the voice interactive application is called a voice assistant, such as "kohma" or the like.
Under the control of a computer, the artificial head circularly plays a pure voice signal including a preset awakening word every 5 seconds, the pure voice signal is recorded as a test voice, a loudspeaker continuously plays sample noise, and a preset scene is simulated, so that the awakening rate of the electronic equipment under the preset scene is verified.
Before starting to carry out audio test, the decibel meter is placed at the same position of the electronic equipment, the electronic equipment obtains a first decibel value of the artificial head playing test voice through the decibel meter and obtains a second decibel value of the loudspeaker playing sample noise, a corresponding signal-to-noise ratio is obtained through calculation according to the first decibel value and the second decibel value, when the signal-to-noise ratio does not reach a preset signal-to-noise ratio, the electronic equipment sends indication information to a computer, the computer adjusts the playing volume of the artificial head and/or the loudspeaker, and when the signal-to-noise ratio reaches the preset signal-to-noise ratio, audio test is carried out according to an audio test flow shown in figure 4:
and 201, the special voice recognition chip acquires audio through a microphone to obtain an audio signal to be verified.
202, the special voice recognition chip loads a primary text awakening model to verify the audio signal to be verified, if the verification is passed, the step is switched to 203, and if the verification is failed, the step is switched to 208.
And 203, the special voice recognition chip provides the audio signal to be verified for the processor, sends first indication information to a preset counting application, and indicates the preset counting application to count so as to obtain a first counting result corresponding to the special voice recognition chip.
204, the processor calls the secondary text awakening model to verify the audio signal to be verified, if the verification is passed, the process proceeds to 205, and if the verification fails, the process proceeds to 208.
205, the processor sends the second text indication information to the preset counting application, and indicates the preset counting application to count to obtain a second text counting result corresponding to the processor.
And 206, calling a voiceprint wake-up model by the processor to check the audio signal to be checked, and if the check is passed, turning to 207, and if the check is failed, turning to 208.
And 207, the processor sends second voiceprint indication information to the preset counting application, and the preset counting application is indicated to count so as to obtain a second voiceprint counting result of the corresponding processor.
208, the processor determines whether the number of times of checking by the dedicated voice recognition chip reaches a preset number, if so, the process proceeds to 209, otherwise, the process proceeds to 201.
209, the processor obtains a first wake-up rate of the dedicated speech recognition chip according to the first counting result and the preset times, obtains a second text wake-up rate of the processor according to the second text counting result and the first counting result, and obtains a second fingerprint wake-up rate of the processor according to the second fingerprint counting result and the second text counting result.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an audio testing apparatus according to an embodiment of the present disclosure. This audio test device can be applied to electronic equipment, and this electronic equipment includes microphone, special speech recognition chip and treater, places electronic equipment in the test environment of buildding in advance, is provided with the pronunciation playback devices who is used for playing test pronunciation in the test environment, and test pronunciation is for including the pure speech signal of predetermineeing the word of awakening up. The audio testing apparatus may include an audio acquisition module 301, a primary verification module 302, a secondary verification module 303, a result acquisition module 304, and a count statistics module 305, wherein,
the audio acquisition module 301 is configured to acquire an audio signal to be verified through a microphone, and provide the audio signal to be verified to the dedicated voice recognition chip;
the primary checking module 302 is configured to perform primary checking on an audio signal to be checked through the dedicated voice recognition chip, provide the audio signal to be checked to the processor when the checking is passed, send first indication information to a preset counting application, and indicate the preset counting application to count so as to obtain a first counting result corresponding to the dedicated voice recognition chip;
the secondary checking module 303 is configured to perform secondary checking on the audio signal to be checked through the processor, send second indication information to the preset counting application when the checking is passed, and indicate the preset counting application to count so as to obtain a second counting result of the corresponding processor;
a result obtaining module 304, configured to determine whether the number of times of performing the primary verification reaches a preset number, otherwise, instruct the audio acquisition module 301 to acquire the audio signal to be verified again through the microphone for verification, and if so, obtain a first counting result and a second counting result;
the counting and counting module 305 is configured to count a first wake-up rate of the dedicated speech recognition chip according to the first counting result and a preset number of times, and count a second wake-up rate of the processor according to the first counting result and the second counting result.
In an embodiment, the second indication information includes second text indication information and second fingerprint indication information, the second counting result includes a second text counting result and a second fingerprint counting result, the second wake-up rate includes a second text wake-up rate and a second fingerprint wake-up rate, and when the secondary verification is performed on the audio signal to be verified through the processor, the secondary verification module 303 is configured to:
calling a pre-trained secondary text verification model corresponding to a preset awakening word through a processor, verifying whether the audio signal to be verified comprises the preset awakening word, if the verification is passed, sending second text indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a second text counting result corresponding to the processor;
calling a pre-trained secondary voiceprint check model corresponding to the test voice through a processor, checking whether the voiceprint features of the audio signal to be checked are matched with the voiceprint features of the test voice, if the check is passed, sending second voiceprint indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a second voiceprint counting result corresponding to the processor;
when the second wake-up rate of the processor is obtained according to the first counting result and the second counting result, the counting module 305 is configured to:
and counting according to the second text counting result and the first counting result to obtain a second text awakening rate, and counting according to the second fingerprint counting result and the second text counting result to obtain a second fingerprint awakening rate.
In an embodiment, the audio testing apparatus further includes a model training module, before the audio acquisition is performed through the microphone to obtain the audio signal to be verified, configured to:
acquiring a pre-trained general verification model corresponding to a preset awakening word, and setting the general verification model as a secondary text verification model;
carrying out audio acquisition through a microphone to obtain a sample audio signal;
and extracting acoustic features of the sample audio signal, carrying out self-adaptive processing on the acoustic features based on the general verification model, and setting the general verification model after the self-adaptive processing as a secondary voiceprint verification model.
In an embodiment, a noise playing device is further disposed in the test scene, and the noise playing device is configured to play sample noise of a preset scene.
In an embodiment, before the audio acquisition by the microphone obtains the audio signal to be verified, the audio acquisition module 301 is further configured to:
acquiring a first decibel value of the voice playing device playing the test voice and a second decibel value of the noise playing device playing the sample noise;
and when the first decibel value and the second decibel value meet the preset test condition, audio acquisition is carried out through the microphone to obtain an audio signal to be verified.
In an embodiment, when performing a primary verification on the audio signal to be verified through the dedicated speech recognition chip, the primary verification module 302 is configured to:
calling a pre-trained scene classification model through a special voice recognition chip to perform scene classification on the audio signal to be verified to obtain a scene classification result;
and calling a pre-trained primary text verification model corresponding to the scene classification result through a special voice recognition chip to verify whether the audio signal to be verified comprises a preset awakening word or not, and judging that the audio signal passes primary verification if the audio signal to be verified comprises the preset awakening word.
In an embodiment, the result obtaining module 304 is further configured to, when the primary verification or the secondary verification fails, determine whether the number of times of performing the primary verification reaches a preset number, if so, obtain a first counting result and a second counting result, otherwise, instruct the audio acquisition module 301 to acquire the audio signal to be verified through the microphone again for verification.
It should be noted that the audio test apparatus provided in the embodiment of the present application and the audio test method in the above embodiment belong to the same concept, and any method provided in the embodiment of the audio test method can be run on the audio test apparatus, and a specific implementation process thereof is described in detail in the embodiment of the feature obtaining method, and is not described here again.
Embodiments of the present application further provide a storage medium, on which a computer program is stored, and when the stored computer program is executed on an electronic device provided in an embodiment of the present application, the electronic device is caused to perform the steps in the audio testing method provided in the embodiment of the present application. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.
Referring to fig. 6, the electronic device includes aprocessor 401, amemory 402, amicrophone 403, and a dedicated voice recognition chip 404.
Theprocessor 401 in the embodiment of the present application is a general-purpose processor, such as an ARM architecture processor.
The dedicatedspeech recognition chip 402 is a dedicated chip designed for speech recognition, such as a digital signal processing chip designed for speech recognition, an application specific integrated circuit chip designed for speech recognition, etc., which has lower power consumption but relatively weaker processing capability than the general-purpose processor 401.
Thememory 402 stores therein a computer program, which may be a high-speed random access memory, or a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage device. Accordingly, thememory 402 may also include a memory controller to provide theprocessor 401, the dedicated speech recognition chip 404, access to thememory 402.
The method comprises the steps of placing electronic equipment in a pre-built test environment, wherein voice playing equipment used for playing test voice is arranged in the test environment, and the test voice is a pure voice signal comprising a preset awakening word.
Theprocessor 401 and the dedicated speech recognition chip 404 are adapted to perform, by calling the computer program in thememory 402, the following:
the special voice recognition chip 404 acquires audio through themicrophone 403 to obtain an audio signal to be verified;
the special voice recognition chip 404 performs primary verification on the audio signal to be verified, provides the audio signal to be verified to theprocessor 401 when the verification is passed, sends first indication information to a preset counting application, and indicates the preset counting application to count so as to obtain a first counting result corresponding to the special voice recognition chip 404;
theprocessor 401 performs secondary verification on the audio signal to be verified, and sends second indication information to the preset counting application when the verification is passed, and the preset counting application is indicated to count so as to obtain a second counting result corresponding to theprocessor 401;
theprocessor 401 determines whether the number of times of primary verification reaches a preset number, if so, obtains a first counting result and a second counting result, otherwise, instructs the special voice recognition chip 404 to acquire the audio signal to be verified again through the microphone for verification;
theprocessor 401 obtains a first wake-up rate of the dedicated speech recognition chip 404 according to the first counting result and the preset number of times, and obtains a second wake-up rate of theprocessor 401 according to the first counting result and the second counting result.
Referring to fig. 7, fig. 7 is another schematic structural diagram of the electronic device according to the embodiment of the present disclosure, and the difference from the electronic device shown in fig. 6 is that the electronic device further includes components such as an input unit 405 and an output unit 406.
The input unit 405 may be used to receive input numbers, character information, or user characteristic information (such as a fingerprint), and generate a keyboard, a mouse, a joystick, an optical or trackball signal input, etc., related to user setting and function control, among others.
The output unit 406 may be used to display information input by the user or information provided to the user, such as a screen.
In the embodiment of the present application, theprocessor 401 and the dedicated speech recognition chip 404 are used to execute, by calling the computer program in the memory 402:
the special voice recognition chip 404 acquires audio through themicrophone 403 to obtain an audio signal to be verified;
the special voice recognition chip 404 performs primary verification on the audio signal to be verified, provides the audio signal to be verified to theprocessor 401 when the verification is passed, sends first indication information to a preset counting application, and indicates the preset counting application to count so as to obtain a first counting result corresponding to the special voice recognition chip 404;
theprocessor 401 performs secondary verification on the audio signal to be verified, and sends second indication information to the preset counting application when the verification is passed, and the preset counting application is indicated to count so as to obtain a second counting result corresponding to theprocessor 401;
theprocessor 401 determines whether the number of times of primary verification reaches a preset number, if so, obtains a first counting result and a second counting result, otherwise, instructs the special voice recognition chip 404 to acquire the audio signal to be verified again through the microphone for verification;
theprocessor 401 obtains a first wake-up rate of the dedicated speech recognition chip 404 according to the first counting result and the preset number of times, and obtains a second wake-up rate of theprocessor 401 according to the first counting result and the second counting result.
In an embodiment, the second indication information includes second text indication information and second fingerprint indication information, the second counting result includes a second text counting result and a second fingerprint counting result, the second wake-up rate includes a second text wake-up rate and a second fingerprint wake-up rate, and when performing secondary verification on the audio signal to be verified, theprocessor 401 is configured to perform:
calling a pre-trained secondary text verification model corresponding to a preset awakening word, verifying whether the audio signal to be verified comprises the preset awakening word, if the verification is passed, sending second text indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a second text counting result corresponding to theprocessor 401;
calling a pre-trained secondary voiceprint check model corresponding to the test voice, checking whether the voiceprint features of the audio signal to be checked are matched with the voiceprint features of the test voice, if the check is passed, sending second voiceprint indication information to a preset counting application, and indicating the preset counting application to count so as to obtain a second voiceprint counting result corresponding to theprocessor 401;
when the second wake-up rate of theprocessor 401 is obtained according to the statistics of the first counting result and the second counting result, theprocessor 401 is configured to perform:
and counting according to the second text counting result and the first counting result to obtain a second text awakening rate, and counting according to the second fingerprint counting result and the second text counting result to obtain a second fingerprint awakening rate.
In an embodiment, before the audio signal to be verified is obtained by audio acquisition through a microphone, theprocessor 401 is further configured to perform:
acquiring a pre-trained general verification model corresponding to a preset awakening word, and setting the general verification model as a secondary text verification model;
acquiring audio through amicrophone 403 to obtain a sample audio signal;
and extracting acoustic features of the sample audio signal, carrying out self-adaptive processing on the acoustic features based on the general verification model, and setting the general verification model after the self-adaptive processing as a secondary voiceprint verification model.
In an embodiment, a noise playing device is further disposed in the test scene, and the noise playing device is configured to play sample noise of a preset scene.
In an embodiment, before the audio signal to be verified is obtained by audio acquisition through a microphone, the dedicated speech recognition chip 404 is further configured to perform:
acquiring a first decibel value of the voice playing device playing the test voice and a second decibel value of the noise playing device playing the sample noise;
when the first decibel value and the second decibel value satisfy the preset test condition, audio acquisition is performed through themicrophone 403 to obtain an audio signal to be verified.
In one embodiment, when performing a primary verification on the audio signal to be verified, the dedicated speech recognition chip 404 is configured to perform:
calling a pre-trained scene classification model to perform scene classification on the audio signal to be verified to obtain a scene classification result;
and calling a pre-trained primary text verification model corresponding to the scene classification result to verify whether the audio signal to be verified comprises a preset awakening word, and if so, judging that the audio signal to be verified passes the primary verification.
In an embodiment, when the primary verification or the secondary verification fails, theprocessor 401 proceeds to execute a judgment to determine whether the number of times of performing the primary verification reaches a preset number of times, if so, obtains a first counting result and a second counting result, otherwise, instructs the dedicated voice recognition chip 404 to acquire the audio signal to be verified through the microphone again for verification.
It should be noted that the electronic device provided in the embodiment of the present application and the audio testing method in the above embodiment belong to the same concept, and any method provided in the embodiment of the audio testing method may be run on the electronic device, and a specific implementation process thereof is described in detail in the embodiment of the feature obtaining method, and is not described here again.
It should be noted that, for the audio testing method of the embodiment of the present application, it can be understood by a person skilled in the art that all or part of the process of implementing the audio testing method of the embodiment of the present application can be completed by controlling the relevant hardware through a computer program, where the computer program can be stored in a computer readable storage medium, such as a memory of an electronic device, and executed by a processor and a dedicated voice recognition chip in the electronic device, and the process of executing the process can include, for example, the process of the embodiment of the audio testing method. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.
The audio testing method, the storage medium and the electronic device provided by the embodiment of the present application are described in detail above, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (9)

Translated fromChinese
1.一种音频测试方法,应用于电子设备,其特征在于,所述电子设备包括麦克风、专用语音识别芯片和处理器,将所述电子设备放置在预先搭建的测试环境中,所述测试环境中设置有用于播放测试语音的语音播放设备,所述测试语音为包括预设唤醒词的纯净语音信号,所述音频测试方法包括:1. an audio testing method, is applied to electronic equipment, it is characterized in that, described electronic equipment comprises microphone, special-purpose speech recognition chip and processor, described electronic equipment is placed in the test environment that builds in advance, described test environment A voice playback device for playing a test voice is provided in the test voice, and the test voice is a pure voice signal including a preset wake-up word, and the audio test method includes:通过所述麦克风进行音频采集得到待校验音频信号,并将所述待校验音频信号提供给所述专用语音识别芯片;The audio signal to be verified is obtained through audio collection by the microphone, and the to-be-verified audio signal is provided to the dedicated speech recognition chip;通过所述专用语音识别芯片调用预先训练的场景分类模型对所述待校验音频信号进行场景分类,得到场景分类结果,通过所述专用语音识别芯片调用预先训练的对应所述场景分类结果的一级文本校验模型校验所述待校验音频信号中是否包括所述预设唤醒词,是则将所述待校验音频信号提供给所述处理器,以及发送第一指示信息至预设计数应用,指示所述预设计数应用进行计数,以得到对应所述专用语音识别芯片的第一计数结果;Use the special speech recognition chip to call a pre-trained scene classification model to classify the audio signal to be verified, and obtain a scene classification result, and use the special speech recognition chip to call a pre-trained scene classification result corresponding to the scene classification result. The advanced text verification model verifies whether the audio signal to be verified includes the preset wake-up word, and if yes, the audio signal to be verified is provided to the processor, and the first indication information is sent to the pre-designed counting application, instructing the preset counting application to count to obtain the first counting result corresponding to the dedicated speech recognition chip;通过所述处理器对所述待校验音频信号进行二级校验,并在校验通过时发送第二指示信息至所述预设计数应用,指示所述预设计数应用进行计数,以得到对应所述处理器的第二计数结果;The processor performs two-level verification on the audio signal to be verified, and when the verification passes, sends second indication information to the preset counting application, instructing the preset counting application to count to obtain corresponding to the second count result of the processor;判断进行一级校验的次数是否达到预设次数,否则重新通过所述麦克风采集待校验音频信号进行校验,是则获取所述第一计数结果以及所述第二计数结果;Judging whether the number of primary verifications has reached a preset number of times, otherwise re-collecting the audio signal to be verified through the microphone for verification, and if so, acquiring the first count result and the second count result;根据所述第一计数结果和所述预设次数统计得到所述专用语音识别芯片的第一唤醒率,以及根据所述第一计数结果和所述第二计数结果统计得到所述处理器的第二唤醒率。The first wake-up rate of the dedicated speech recognition chip is obtained according to the first count result and the preset count, and the first wake-up rate of the processor is obtained according to the first count result and the second count result. 2. Wake rate.2.根据权利要求1所述的音频测试方法,其特征在于,所述第二指示信息包括第二文本指示信息和第二声纹指示信息,所述第二计数结果包括第二文本计数结果和第二声纹计数结果,所述第二唤醒率包括第二文本唤醒率和第二声纹唤醒率,所述通过所述处理器对所述待校验音频信号进行二级校验,包括:2. The audio testing method according to claim 1, wherein the second indication information comprises second text indication information and second voiceprint indication information, and the second count result comprises the second text count result and The second voiceprint count result, the second wakeup rate includes the second text wakeup rate and the second voiceprint wakeup rate, and the second-level verification of the audio signal to be verified by the processor includes:通过所述处理器调用预先训练的对应所述预设唤醒词的二级文本校验模型,校验所述待校验音频信号中是否包括所述预设唤醒词,是则发送所述第二文本指示信息至所述预设计数应用,指示所述预设计数应用进行计数,以得到对应所述处理器的第二文本计数结果;Call the pre-trained secondary text verification model corresponding to the preset wake-up word by the processor to verify whether the preset wake-up word is included in the to-be-verified audio signal, and send the second Text indication information to the preset counting application, instructing the preset counting application to count to obtain a second text counting result corresponding to the processor;通过所述处理器调用预先训练的对应所述测试语音的二级声纹校验模型,校验所述待校验音频信号的声纹特征是否与所述测试语音的声纹特征匹配,是则发送所述第二声纹指示信息至所述预设计数应用,指示所述预设计数应用进行计数,以得到对应所述处理器的第二声纹计数结果;Call the pre-trained secondary voiceprint verification model corresponding to the test voice by the processor to verify whether the voiceprint feature of the audio signal to be verified matches the voiceprint feature of the test voice. sending the second voiceprint indication information to the preset counting application, and instructing the preset counting application to count to obtain a second voiceprint counting result corresponding to the processor;所述根据所述第一计数结果和所述第二计数结果统计得到所述处理器的第二唤醒率,包括:The obtaining of the second wake-up rate of the processor according to the first counting result and the second counting result, including:根据所述第二文本计数结果和所述第一计数结果统计得到所述第二文本唤醒率,以及根据所述第二声纹计数结果和所述第二文本计数结果统计得到所述第二声纹唤醒率。The second text wake-up rate is statistically obtained according to the second text count result and the first count result, and the second voice is obtained according to the second voiceprint count result and the second text count result. pattern wake rate.3.根据权利要求2所述的音频测试方法,其特征在于,所述通过所述麦克风进行音频采集得到待校验音频信号之前,还包括:3. audio testing method according to claim 2, is characterized in that, before the described audio frequency collection by described microphone obtains to-be-checked audio signal, also comprises:获取预先训练的对应所述预设唤醒词的通用校验模型,将所述通用校验模型设为所述二级文本校验模型;obtaining a pre-trained general verification model corresponding to the preset wake-up word, and setting the general verification model as the secondary text verification model;通过所述麦克风进行音频采集得到样本音频信号;Perform audio collection through the microphone to obtain a sample audio signal;提取所述样本音频信号的声学特征,并基于所述通用校验模型对所述声学特征进行自适应处理,将自适应处理后的通用校验模型设为所述二级声纹校验模型。Extracting the acoustic features of the sample audio signal, and performing adaptive processing on the acoustic features based on the general verification model, and setting the adaptively processed general verification model as the second-level voiceprint verification model.4.根据权利要求1-3任一项所述的音频测试方法,其特征在于,所述测试场景中还设置有噪声播放设备,所述噪声播放设备用于播放预设场景的样本噪声。4 . The audio testing method according to claim 1 , wherein a noise playing device is further provided in the test scene, and the noise playing device is used to play sample noise of a preset scene. 5 .5.根据权利要求4所述的音频测试方法,其特征在于,所述通过所述麦克风进行音频采集得到待校验音频信号之前,还包括:5. audio testing method according to claim 4, is characterized in that, before described audio frequency collection by described microphone to obtain to-be-verified audio signal, also comprises:获取所述语音播放设备播放所述测试语音的第一分贝值,以及获取所述噪声播放设备播放所述样本噪声的第二分贝值;Acquire a first decibel value of the test voice played by the voice playback device, and a second decibel value of the sample noise played by the noise playback device;当所述第一分贝值以及所述第二分贝值满足预设测试条件时,通过所述麦克风进行音频采集得到待校验音频信号。When the first decibel value and the second decibel value satisfy a preset test condition, the audio signal to be verified is obtained by performing audio collection through the microphone.6.根据权利要求1-3任一项所述的音频测试方法,其特征在于,所述音频测试方法还包括,在一级校验或二级校验失败时,转入所述判断进行一级校验的次数是否达到预设次数。6. audio testing method according to any one of claim 1-3, is characterized in that, described audio testing method also comprises, when one-level verification or two-level verification fails, transfer into described judgment to carry out a Check whether the number of level verification reaches the preset number.7.一种音频测试装置,应用于电子设备,其特征在于,所述电子设备包括麦克风、专用语音识别芯片和处理器,将所述电子设备放置在预先搭建的测试环境中,所述测试环境中设置有用于播放测试语音的语音播放设备,所述测试语音为包括预设唤醒词的纯净语音信号,所述音频测试方法包括:7. An audio testing device, applied to electronic equipment, characterized in that the electronic equipment comprises a microphone, a dedicated speech recognition chip and a processor, and the electronic equipment is placed in a pre-built test environment, the test environment A voice playback device for playing a test voice is provided in the test voice, and the test voice is a pure voice signal including a preset wake-up word, and the audio test method includes:音频采集模块,用于通过所述麦克风进行音频采集得到待校验音频信号,并将所述待校验音频信号提供给所述专用语音识别芯片;an audio acquisition module, configured to acquire audio signals to be verified through audio acquisition by the microphone, and to provide the audio signals to be verified to the dedicated speech recognition chip;一级校验模块,用于通过所述专用语音识别芯片调用预先训练的场景分类模型对所述待校验音频信号进行场景分类,得到场景分类结果,通过所述专用语音识别芯片调用预先训练的对应所述场景分类结果的一级文本校验模型校验所述待校验音频信号中是否包括所述预设唤醒词,是则将所述待校验音频信号提供给所述处理器,以及发送第一指示信息至预设计数应用,指示所述预设计数应用进行计数,以得到对应所述专用语音识别芯片的第一计数结果;The first-level verification module is used to call the pre-trained scene classification model through the special speech recognition chip to classify the audio signal to be verified, and obtain the scene classification result, and use the special speech recognition chip to call the pre-trained scene classification model. The first-level text verification model corresponding to the scene classification result verifies whether the preset wake-up word is included in the to-be-verified audio signal, and if so, the to-be-verified audio signal is provided to the processor, and sending first indication information to a preset counting application, instructing the preset counting application to count, so as to obtain a first counting result corresponding to the dedicated speech recognition chip;二级校验模块,用于通过所述处理器对所述待校验音频信号进行二级校验,并在校验通过时发送第二指示信息至所述预设计数应用,指示所述预设计数应用进行计数,以得到对应所述处理器的第二计数结果;The second-level verification module is used to perform second-level verification on the audio signal to be verified by the processor, and when the verification is passed, send second indication information to the preset count application, indicating the preset count application. The design number application is counted to obtain a second count result corresponding to the processor;结果获取模块,用于判断进行一级校验的次数是否达到预设次数,否则重新通过所述麦克风采集待校验音频信号进行校验,是则获取所述第一计数结果以及所述第二计数结果;The result acquisition module is used to judge whether the number of primary verifications reaches a preset number of times, otherwise, the audio signal to be verified is collected again through the microphone for verification, and if so, the first count result and the second count result are acquired. count result;计数统计模块,用于根据所述第一计数结果和所述预设次数统计得到所述专用语音识别芯片的第一唤醒率,以及根据所述第一计数结果和所述第二计数结果统计得到所述处理器的第二唤醒率。A counting and statistics module for obtaining the first wake-up rate of the dedicated speech recognition chip according to the first counting result and the preset number of times, and obtaining statistics according to the first counting result and the second counting result a second wake-up rate for the processor.8.一种电子设备,其特征在于,包括麦克风、专用语音识别芯片、处理器和存储器,所述存储器中存储有计算机程序,且所述专用语音识别芯片的功耗小于所述处理器的功耗,所述计算机程序被所述专用语音识别芯片和处理器调用时用于执行如权利要求1-6任一项所述的音频测试方法。8. An electronic device, characterized in that it comprises a microphone, a dedicated speech recognition chip, a processor and a memory, wherein a computer program is stored in the memory, and the power consumption of the dedicated speech recognition chip is less than the power of the processor. consumption, the computer program is used to execute the audio testing method according to any one of claims 1-6 when the computer program is invoked by the dedicated speech recognition chip and the processor.9.一种存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器和专用语音识别芯片加载以执行如权利要求1-6任一项所述的音频测试方法。9. A storage medium on which a computer program is stored, wherein the computer program is loaded by a processor and a dedicated speech recognition chip to execute the audio testing method according to any one of claims 1-6.
CN201910818540.3A2019-08-302019-08-30 Audio test method, device, storage medium and electronic equipmentExpired - Fee RelatedCN110602624B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910818540.3ACN110602624B (en)2019-08-302019-08-30 Audio test method, device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910818540.3ACN110602624B (en)2019-08-302019-08-30 Audio test method, device, storage medium and electronic equipment

Publications (2)

Publication NumberPublication Date
CN110602624A CN110602624A (en)2019-12-20
CN110602624Btrue CN110602624B (en)2021-05-25

Family

ID=68856538

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910818540.3AExpired - Fee RelatedCN110602624B (en)2019-08-302019-08-30 Audio test method, device, storage medium and electronic equipment

Country Status (1)

CountryLink
CN (1)CN110602624B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111161714B (en)*2019-12-252023-07-21联想(北京)有限公司Voice information processing method, electronic equipment and storage medium
CN111261195A (en)*2020-01-102020-06-09Oppo广东移动通信有限公司Audio testing method and device, storage medium and electronic equipment
CN111369992A (en)*2020-02-272020-07-03Oppo(重庆)智能科技有限公司Instruction execution method and device, storage medium and electronic equipment
CN113593541B (en)*2020-04-302024-03-12阿里巴巴集团控股有限公司Data processing method, device, electronic equipment and computer storage medium
CN111755002B (en)*2020-06-192021-08-10北京百度网讯科技有限公司Speech recognition device, electronic apparatus, and speech recognition method
CN111899722B (en)*2020-08-112024-02-06Oppo广东移动通信有限公司 Speech processing method, device and storage medium
CN111933137B (en)*2020-08-192024-04-16Oppo广东移动通信有限公司Voice wake-up test method and device, computer readable medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108511000A (en)*2018-03-062018-09-07福州瑞芯微电子股份有限公司A kind of test intelligent sound box wakes up the method and system of word discrimination
CN108766441A (en)*2018-05-292018-11-06广东声将军科技有限公司A kind of sound control method and device based on offline Application on Voiceprint Recognition and speech recognition
CN109065046A (en)*2018-08-302018-12-21出门问问信息科技有限公司Method, apparatus, electronic equipment and the computer readable storage medium that voice wakes up
CN109712608A (en)*2019-02-282019-05-03百度在线网络技术(北京)有限公司Multitone area wake-up test method, apparatus and storage medium
CN109817219A (en)*2019-03-192019-05-28四川长虹电器股份有限公司Voice wake-up test method and system
CN109979438A (en)*2019-04-042019-07-05Oppo广东移动通信有限公司Voice wake-up method and electronic equipment
CN110164474A (en)*2019-05-082019-08-23北京百度网讯科技有限公司Voice wakes up automated testing method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108511000A (en)*2018-03-062018-09-07福州瑞芯微电子股份有限公司A kind of test intelligent sound box wakes up the method and system of word discrimination
CN108766441A (en)*2018-05-292018-11-06广东声将军科技有限公司A kind of sound control method and device based on offline Application on Voiceprint Recognition and speech recognition
CN109065046A (en)*2018-08-302018-12-21出门问问信息科技有限公司Method, apparatus, electronic equipment and the computer readable storage medium that voice wakes up
CN109712608A (en)*2019-02-282019-05-03百度在线网络技术(北京)有限公司Multitone area wake-up test method, apparatus and storage medium
CN109817219A (en)*2019-03-192019-05-28四川长虹电器股份有限公司Voice wake-up test method and system
CN109979438A (en)*2019-04-042019-07-05Oppo广东移动通信有限公司Voice wake-up method and electronic equipment
CN110164474A (en)*2019-05-082019-08-23北京百度网讯科技有限公司Voice wakes up automated testing method and system

Also Published As

Publication numberPublication date
CN110602624A (en)2019-12-20

Similar Documents

PublicationPublication DateTitle
CN110602624B (en) Audio test method, device, storage medium and electronic equipment
CN111261195A (en)Audio testing method and device, storage medium and electronic equipment
CN110581915B (en)Stability testing method and device, storage medium and electronic equipment
CN110832580B (en) Detection of replay attacks
EP2700071B1 (en)Speech recognition using multiple language models
CN110853617B (en)Model training method, language identification method, device and equipment
CN106782536A (en)A kind of voice awakening method and device
CN107799126A (en)Sound end detecting method and device based on Supervised machine learning
CN113330511B (en) Speech recognition method, device, storage medium and electronic device
CN106297801A (en)Method of speech processing and device
US12340820B2 (en)Health-related information generation and storage
CN110580897B (en)Audio verification method and device, storage medium and electronic equipment
CN110544468B (en)Application awakening method and device, storage medium and electronic equipment
CN110047512A (en)A kind of ambient sound classification method, system and relevant apparatus
JP7532552B2 (en) Method and apparatus for testing a full-duplex voice interaction system
EP4095850A1 (en)Instruction execution method and apparatus, storage medium, and electronic device
CN110706691B (en)Voice verification method and device, electronic equipment and computer readable storage medium
CN111179915A (en)Age identification method and device based on voice
CN114299927A (en) Wake word recognition method, device, electronic device and storage medium
CN118748022A (en) Audio processing method, device, electronic device and storage medium
CN109271480B (en) A kind of voice search method and electronic device
US10818298B2 (en)Audio processing
CN116132869A (en)Earphone volume adjusting method, earphone and storage medium
CN115762510A (en) Voice arbitration method, device, electronic equipment and storage medium
CN113348502A (en) Speech recognition method, device, storage medium and electronic device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20210525


[8]ページ先頭

©2009-2025 Movatter.jp