CN110895941A

Movatterモバイル変換

Info

Publication number: CN110895941A
Application number: CN201810969451.4A
Authority: CN
Inventors: 熊友军; 李浩明; 夏严辉; 李利阳; 温品秀
Original assignee: Shenzhen Ubtech Technology Co ltd
Current assignee: Shenzhen Ubtech Technology Co ltd
Priority date: 2018-08-23
Filing date: 2018-08-23
Publication date: 2020-03-20

Abstract

The application discloses a voiceprint recognition method, a voiceprint recognition device and a storage device, wherein the method comprises the following steps: acquiring a first voiceprint feature in current voice data; judging whether the first voiceprint features are matched with the voiceprint features in the voiceprint library or not; and if so, extracting the first person voice data of the current voice data for voice recognition. By means of the method, accuracy and signal-to-noise ratio of subsequent voice recognition can be improved.

Description

Voiceprint recognition method and device and storage device

Technical Field

The application relates to the technical field of intelligent robots, in particular to a voiceprint recognition method, a voiceprint recognition device and a storage device.

Background

With the improvement of living standard and the progress of robot technology, the robot gradually enters the visual field of common families, such as common weeding robots, cleaning robots, housekeeping robots and the like in the market. With the development and popularization of voice recognition technology, many robots in the market have the functions of voice recognition and semantic recognition, and a user can interact with the robots in a form of direct voice conversation with the robots to control the robots to work.

The current voice recognition technology can better receive and recognize the voice command of the user and interact with the user under a quiet environment, but once noise appears in the environment (for example, when the voice recognition technology is applied to a space with more speakers or the background noise of the applied space is stronger), the background noise and the voice command of the speaker are mixed together, so that the voice recognition is more difficult, and the recognition accuracy is greatly reduced.

Disclosure of Invention

The application provides a voiceprint recognition method, a voiceprint recognition device and a storage device, and can solve the problem that in the prior art, the accuracy rate is low in voice recognition.

In order to solve the technical problem, the application adopts a technical scheme that: there is provided a voiceprint recognition method, the method comprising: acquiring a first voiceprint feature in current voice data; judging whether the first voiceprint features are matched with the voiceprint features in the voiceprint library or not; and if so, extracting the first person voice data of the current voice data for voice recognition.

In order to solve the above technical problem, another technical solution adopted by the present application is: providing a voiceprint recognition device, wherein the device comprises a processor and a memory, and the processor is connected with the memory; the processor is used for acquiring a first voiceprint feature in current voice data; judging whether the first voiceprint features are matched with the voiceprint features in the voiceprint library or not; and if so, extracting the first person voice data of the current voice data for voice recognition.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a storage device storing a program file capable of implementing any one of the methods described above.

The beneficial effect of this application is: the voiceprint recognition method, the voiceprint recognition device and the storage device are provided, and the accuracy and the signal-to-noise ratio of subsequent voice recognition can be improved by comparing and matching the first voiceprint feature in the obtained current voice data with the voiceprint feature in the preset voiceprint library.

Drawings

FIG. 1 is a schematic flow chart diagram of an embodiment of a voiceprint recognition method of the present application;

FIG. 2 is a schematic flow chart diagram illustrating an embodiment of step S1 of the present application;

FIG. 3 is a schematic flow chart diagram illustrating an embodiment of step S2 of the present application;

FIG. 4 is a schematic structural diagram of an embodiment of the voiceprint recognition apparatus of the present application;

fig. 5 is a schematic structural diagram of an embodiment of a memory device according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise.

Voiceprints are a very important speech feature, which can be used to distinguish and identify different users, and have characteristics of specificity and relative stability. After the adult, the voice of the human can be kept relatively stable and unchanged for a long time. Voiceprint recognition belongs to one of biological recognition technologies, and is a technology for automatically recognizing the identity of a speaker according to voice parameters reflecting the physiological and behavioral characteristics of the speaker in a voice waveform. The voiceprint recognition technology in the application can be applied to awakening of the intelligent robot.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a voiceprint recognition method according to the present application, where the voiceprint recognition method specifically includes the following steps:

s1, establishing a voiceprint library containing the voiceprint characteristics.

In this embodiment, by presetting the voiceprint library including the voiceprint features, the accuracy of the robot in the subsequent voice recognition process can be improved. And the voiceprint library has a higher signal-to-noise ratio with respect to the recording including noise with the ambient speech data (noise) removed, the step S1 further includes the following sub-steps with reference to fig. 2:

and S11, collecting voice data, wherein the voice data comprises environment voice data and human voice data.

In this embodiment, it is first necessary to collect voice data including human voice data, and generally, it is required that the collection environment is relatively quiet, so that the noise influence in the collected recording data can be reduced. Alternatively, in this embodiment, the collected voice data may be regarded as including only human voice data and ambient voice data (noise). The voice data can be only one, that is, the robot only identifies the voice print of a certain person. Of course, in other embodiments, the voice data may be multiple, that is, the robot may recognize voice prints of multiple persons.

And S12, filtering and denoising the voice data to obtain the human voice data.

In practical situations, when voice data is collected, the environment may be flooded with various environmental noises, such as noise of common refrigerator operation, noise of vehicle running, noise of air conditioner operation, and some human voice noise (i.e. sound that the robot does not recognize its voice), and the like, and these environmental noises are often collected when voice data is collected. In order to reduce the influence of reducing the environmental noise on the subsequent voiceprint extraction and speech extraction recognition, in this embodiment, filtering and denoising processing is performed on the collected speech data, so as to filter out the environmental speech data, i.e., the noise part, in the speech data. Optionally, the filtering and noise reduction may be implemented by filtering and noise reduction hardware, or may be implemented by a filtering and noise reduction software algorithm, which may be selected by a technician according to actual conditions, and is not further limited herein.

Optionally, in a specific filtering and denoising process, if the robot identifies only one voiceprint, other voice data in the collected voice data may be correspondingly filtered, and correspondingly, if the robot can identify a plurality of voice data, a plurality of voice data obtained after the filtering and denoising process are also provided.

Optionally, after filtering and denoising the collected voice data to obtain human voice data with relatively less environmental noise, step S13 may be further performed.

And S13, extracting the voiceprint features in the human voice data to obtain a voiceprint library.

Optionally, after obtaining the voice data with relatively less environmental noise, further performing feature extraction on the processed voice data is needed. The task of feature extraction is to extract and select acoustic or language features with characteristics of strong separability, high stability and the like for voiceprints of speakers. Optionally, in this embodiment, the following manner may be used:

1. using the spectral envelope parameters, the speech information is output through a filter bank, which outputs ozone at a suitable rate to the filter, and takes them as voiceprint characteristics.

2. Parameters extracted based on physiological structures of vocal organs (such as glottis, vocal tract, and nasal cavity) are taken as vocal print features, and for example, a pitch contour, a formant frequency bandwidth, and a locus thereof contained in speech information are extracted as vocal print features.

3. Linear Predictive Coefficient (LPC) including various parameters derived from Linear prediction, such as parameters of Linear prediction Coefficient, autocorrelation Coefficient, emission Coefficient, logarithmic area, Linear prediction residual, and combinations thereof, is used as the voiceprint recognition feature.

4. Reflecting the parameters of the hearing characteristics, and simulating the characteristics of human ears on sound frequency perception, wherein the parameters comprise Mel cepstrum coefficients MFCC, perception linear prediction coefficients PLP, depth characteristics Deep Feature, energy regularization spectrum coefficients PNCC and the like.

5. Lexical features, speaker dependent word n-grams and phoneme n-grams

6. Prosodic features, pitch and energy "poses" described by n-grams "

7. Language, dialect and accent information, channel information, etc.

Of course, the performance of the actual system is improved for the combination of different feature yields, and when the correlation between the parameters of each combination is not large, the effect is better because it reflects the different features of the speech signal respectively. The method may use any one or combination of the above methods to extract the voiceprint features in the human voice data, and is not limited specifically here. Optionally, after extracting the voiceprint features of the voice data, a voiceprint model library is constructed and stored in a processing center or a control center of the robot. The voiceprint library may include only one voiceprint feature of one person, or may include voiceprint features of multiple persons.

Optionally, the establishment of the voiceprint library may be repeated multiple times, that is, when voice data including the same person or multiple persons are collected, different collection environments may be changed, or voice data in different states (such as cold and emotion change) of a speaker may be collected, so as to improve the accuracy of subsequent voiceprint recognition of a subsequent robot.

S2, acquiring a first voiceprint feature in the current voice data.

In the subsequent speech recognition process of the robot, the matching of the voiceprint features is performed on the current speech data to be acquired first, that is, referring to fig. 3, step S2 further includes the following sub-steps:

and S21, processing the current voice data to obtain first person voice data.

Optionally, in this embodiment, details of the filtering and denoising processing method for the current speech data may be described in the foregoing embodiment, and are not described herein again.

And S22, extracting the voiceprint features in the first person voice data to obtain first voiceprint features.

Optionally, after obtaining the first person voice data with relatively less environmental noise, further performing feature extraction on the processed first person voice data to obtain the first voiceprint feature. Similarly, the first voiceprint feature is also used for descriptive purposes only, that is, the first voiceprint feature may include voiceprint features of a plurality of speakers or may be a voiceprint feature of only one speaker. For a detailed process of extracting the voiceprint feature, reference may also be made to the detailed description of the foregoing embodiment, which is not described herein again.

And S3, judging whether the first voiceprint feature is matched with the voiceprint features in the voiceprint library.

After the first voiceprint feature of the voice data is acquired, the first voiceprint feature needs to be matched with the voiceprint feature in the voiceprint library, wherein the matching method comprises a plurality of methods, such as a probability statistical method, a dynamic time warping method, a vector quantization method, a hidden markov model method, an artificial neural network method and the like. The present application may adopt any one of the above-mentioned methods to match the first voiceprint feature with the voiceprint features in the voiceprint library, so as to determine whether the first voiceprint feature exists in the voiceprint library, if so, execute step S4, and if not, enter step S2. Of course, in other embodiments, the first voiceprint feature and the voiceprint features in the voiceprint library can be matched in other manners, which is not further limited herein.

And S4, if the voice data are matched with the voice data, extracting the first person voice data of the current voice data for voice recognition.

In step S4, when it is determined that the voiceprint feature matching the first voiceprint feature of the first voiceprint data in the current voice data exists in the voiceprint library, the first voiceprint data in the current voice data may be extracted for voice recognition to wake up the robot.

Optionally, if there are a plurality of voice data in the current voice data and the voice print library, that is, the robot may recognize voice data of a plurality of persons, the robot may be awakened by setting the awakening voice data, that is, the acquired voice data may include a keyword or a lingering phrase for awakening the robot, and the robot may be awakened after recognizing the keyword or the lingering phrase.

In a specific application scenario of the application, when the robot recognizes that there are a plurality of pieces of vocal data matching with the voiceprint library in the current voice data, the robot may further process the first vocal data, and extract the first vocal data including the awakening voice data to awaken the robot. For example, if the collected first-person voice data is voice data of three different speakers, which are respectively "welcome", "turn on air conditioner" and "start program", and the first voiceprint features extracted from the three different speakers all exist in the voiceprint library, then if the awakening voice data is "start program", the first-person voice data including the "start program" is automatically extracted, so as to awaken the robot.

In this embodiment, when the robot recognizes that there are a plurality of voice data matching with the voiceprint library in the current voice data, the robot may be awakened by preferentially recognizing and awakening the voice data (keywords or linguistics).

If the first voiceprint feature in the current voice data is judged not to be matched with the voiceprint feature in the voiceprint library, step S2 is executed to continue to obtain the first voiceprint feature in the current voice data.

In the above embodiment, the accuracy and the signal-to-noise ratio of the subsequent voice recognition can be improved by comparing and matching the first voiceprint feature in the obtained current voice data with the voiceprint feature in the preset voiceprint library.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an embodiment of a voiceprint recognition device according to the present application. As shown in fig. 4, the apparatus includes aprocessor 11 and amemory 12, and theprocessor 11 is connected to thememory 12.

Theprocessor 11 is configured to obtain a first voiceprint feature in current voice data; judging whether the first voiceprint features are matched with the voiceprint features in the voiceprint library or not; and if so, extracting the first person voice data of the current voice data for voice recognition.

Wherein theprocessor 11 is further configured to create a voiceprint library containing voiceprint features.

Theprocessor 11 may also be referred to as a CPU (Central Processing Unit). Theprocessor 11 may be an integrated circuit chip having signal processing capabilities. Theprocessor 11 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The processor in the apparatus may respectively execute the corresponding steps in the method embodiments, and thus details are not repeated here, and please refer to the description of the corresponding steps above.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of a memory device according to the present application. The storage device of the present application stores aprogram file 21 capable of implementing all the methods described above, wherein theprogram file 21 may be stored in the storage device in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. The aforementioned storage device includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

In summary, it is easily understood by those skilled in the art that a voiceprint recognition method, a voiceprint recognition device and a storage device are provided, and by comparing and matching a first voiceprint feature in the obtained current voice data with a voiceprint feature in a preset voiceprint library, the accuracy and the signal-to-noise ratio of subsequent voice recognition can be improved.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

Translated fromChinese

1.一种声纹识别方法，其特征在于，所述识别方法包括：1. a voiceprint recognition method, is characterized in that, described recognition method comprises:

获取当前语音数据中的第一声纹特征；Obtain the first voiceprint feature in the current voice data;

判断所述第一声纹特征是否和所述声纹库中的所述声纹特征匹配；Judging whether the first voiceprint feature matches the voiceprint feature in the voiceprint library;

若匹配，则提取所述当前语音数据的第一人声数据进行语音识别。If there is a match, extract the first human voice data of the current voice data for voice recognition.

2.根据权利要求1所述的识别方法，其特征在于，所述获取当前语音数据中的第一声纹特征之前包括：2. The identification method according to claim 1, wherein before the acquisition of the first voiceprint feature in the current voice data, the method comprises:

建立包含所述声纹特征的所述声纹库。Building the voiceprint library containing the voiceprint features.

3.根据权利要求2所述的识别方法，其特征在于，所述建立包含声纹特征的声纹库包括：3. The identification method according to claim 2, wherein said establishing a voiceprint library comprising voiceprint features comprises:

采集语音数据，所述语音数据包括环境语音数据及人声数据；Collecting voice data, the voice data includes ambient voice data and human voice data;

对所述语音数据进行滤波降噪处理，得到所述人声数据；Perform filtering and noise reduction processing on the voice data to obtain the human voice data;

提取所述人声数据中的声纹特征以得到所述声纹库。Extracting voiceprint features in the human voice data to obtain the voiceprint library.

4.根据权利要求1所述的识别方法，其特征在于，所述获取当前语音数据中的第一声纹特征包括：4. The identification method according to claim 1, wherein the acquiring the first voiceprint feature in the current voice data comprises:

对所述当前语音数据进行处理以得到所述第一人声数据；processing the current voice data to obtain the first vocal data;

提取所述第一人声数据中的声纹特征，得到所述第一声纹特征。Extracting voiceprint features in the first vocal data to obtain the first voiceprint features.

5.根据权利要求4所述的识别方法，其特征在于，所述人声数据包括唤醒语音数据，所述唤醒语音数据用于唤醒机器人。5 . The identification method according to claim 4 , wherein the human voice data comprises wake-up voice data, and the wake-up voice data is used to wake up the robot. 6 .

6.根据权利要求5所述的识别方法，其特征在于，所述当前语音数据中和所述声纹库中相匹配的人声数据为多个；6. identification method according to claim 5, is characterized in that, the human voice data that matches in described current speech data and described voiceprint library is a plurality of;

所述从所述当前语音数据中提取所述第一声纹特征进行识别包括：The extracting the first voiceprint feature from the current voice data for identification includes:

提取包含所述唤醒语音数据的所述第一人声数据以唤醒所述机器人。The first vocal data including the wake-up voice data is extracted to wake up the robot.

7.根据权利要求4所述的识别方法，其特征在于，若判断所述当前语音数据中的第一声纹特征和所述声纹库中的所述声纹特征不匹配，则继续获取所述第一声纹特征。7. The identification method according to claim 4, wherein, if it is judged that the first voiceprint feature in the current voice data does not match the voiceprint feature in the voiceprint library, then continue to obtain all the voiceprint features. Describe the first voiceprint feature.

8.一种声纹识别装置，其特征在于，所述装置包括处理器及存储器，所述处理器连接所述存储器；其中，所述处理器用于获取当前语音数据中的当前声纹特征；判断所述当前声纹特征是否和所述声纹库中的所述声纹特征匹配；若匹配，则从所述当前语音数据中提取所述当前声纹特征进行识别。8. A voiceprint recognition device, wherein the device comprises a processor and a memory, and the processor is connected to the memory; wherein, the processor is used to obtain the current voiceprint feature in the current voice data; determine Whether the current voiceprint feature matches the voiceprint feature in the voiceprint database; if so, extract the current voiceprint feature from the current voice data for identification.

9.根据权利要求7所述的装置，其特征在于，所述处理器还用于建立包含声纹特征的所述声纹库。9. The apparatus according to claim 7, wherein the processor is further configured to establish the voiceprint library including voiceprint features.

10.一种存储装置，其特征在于，存储有能够实现如权利要求1-7中任一项所述方法的程序文件。10. A storage device, characterized in that a program file capable of implementing the method according to any one of claims 1-7 is stored.