CN110808052A

Movatterモバイル変換

Info

Publication number: CN110808052A
Application number: CN201911103239.0A
Authority: CN
Inventors: 郑楚升
Original assignee: Shenzhen Rui Yun Technology Co Ltd
Current assignee: Shenzhen Rui Yun Technology Co Ltd
Priority date: 2019-11-12
Filing date: 2019-11-12
Publication date: 2020-02-18

Abstract

The invention provides a voice recognition method, a voice recognition device and electronic equipment. The voice recognition method comprises the following steps: acquiring voice data of a user and age information of the user; determining the frequency range information of the voice of the user according to the age information; dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information; setting a filtering algorithm corresponding to each sub-frequency band; filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data; and performing voice recognition on the optimized voice data. The invention can improve the accuracy of voice recognition.

Description

Voice recognition method and device and electronic equipment

Technical Field

The invention relates to the field of voice recognition, in particular to a voice recognition method, a voice recognition device and electronic equipment.

Background

Pitch is the sound level, which depends on frequency. The long, thick sounding body vibrates slowly, and the short, thin sounding body vibrates quickly. The height of the voice is related to the length, thickness and tightness of vocal cords of a person. Since the voices of children when speaking, the voices of middle-aged and elderly people among adults, and the like are different from each other, the common speech recognition model has a high error rate when being used for recognizing the voices of users of different ages, thereby reducing the recognition accuracy.

Disclosure of Invention

The embodiment of the invention provides a voice recognition method, a voice recognition device and electronic equipment, which have the beneficial effect of improving the recognition accuracy.

The invention provides a voice recognition method, which comprises the following steps:

acquiring voice data of a user and age information of the user;

determining the frequency range information of the voice of the user according to the age information;

dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information;

setting a filtering algorithm corresponding to each sub-frequency band;

filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data;

and performing voice recognition on the optimized voice data.

The present invention also provides a speech recognition apparatus comprising:

the first acquisition module is used for acquiring voice data of a user and age information of the user;

the second acquisition module is used for determining the frequency range information of the voice of the user according to the age information;

a dividing module, configured to divide a corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information;

the setting module is used for setting a filtering algorithm corresponding to each sub-frequency band;

the filtering module is used for filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data;

and the voice recognition module is used for carrying out voice recognition on the optimized voice data.

The invention also provides an electronic device, which comprises a processor and a memory, wherein the memory is stored with a computer program, and the processor is used for the voice recognition method by calling the computer program stored in the memory.

The invention obtains the voice data of the user and the age information of the user; determining the frequency range information of the voice of the user according to the age information; dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information; setting a filtering algorithm corresponding to each sub-frequency band; filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data; performing voice recognition on the optimized voice data; because the voice data is filtered according to the age information and then subjected to voice recognition, the accuracy of the voice recognition is improved, and the voice recognition efficiency is improved.

Drawings

FIG. 1 is a flow chart of a speech recognition method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a speech recognition method according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a speech recognition apparatus according to a third embodiment of the present invention.

Fig. 4 is a schematic diagram of a preferred structure of a speech recognition apparatus according to a third embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; may be mechanically connected, may be electrically connected or may be in communication with each other; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the present invention, unless otherwise expressly stated or limited, "above" or "below" a first feature means that the first and second features are in direct contact, or that the first and second features are not in direct contact but are in contact with each other via another feature therebetween. Also, the first feature being "on," "above" and "over" the second feature includes the first feature being directly on and obliquely above the second feature, or merely indicating that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature includes the first feature being directly under and obliquely below the second feature, or simply meaning that the first feature is at a lesser elevation than the second feature.

The following disclosure provides many different embodiments or examples for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Of course, they are merely examples and are not intended to limit the present invention. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples, such repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. In addition, the present invention provides examples of various specific processes and materials, but one of ordinary skill in the art may recognize applications of other processes and/or uses of other materials.

Referring to fig. 1, fig. 1 is a flowchart illustrating a speech recognition method according to an embodiment of the present invention.

The speech recognition method of the embodiment comprises the following steps:

s101, voice data of a user and age information of the user are obtained.

The voice data may be voice uploaded by a user or voice collected in real time, and the voice data is voice of a user speaking so as to be distinguished from non-user voices such as environmental sounds or animal sounds.

And S102, determining the frequency range information of the voice of the user according to the age information.

Wherein, the age information of the user can be inquired in the registration information.

For example, users are roughly classified into children, young adults, middle-aged adults, and elderly adults, and the users are respectively associated with different age groups. Each age group of speech corresponds to a different frequency range. Such as children, young adults, middle-aged adults, and elderly people, are in different frequency ranges.

And S103, dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information.

Wherein, the voice of each age group corresponds to different frequency range information. And the frequency bandwidth of the speech of people of different ages is different, and the sub-frequency band can be divided according to the total width of the frequency band.

And S104, setting a filtering algorithm corresponding to each sub-frequency band.

For example, a voice data optimization table may be preset, where multiple sub-bands are stored in the voice data optimization table, and the voice data of different sub-bands correspond to different filtering algorithms.

And the filtering algorithm corresponding to each sub-band is executed by adopting a Gaussian band-pass filter corresponding to the sub-band. Of course, a gaussian band-pass filter with multiple sub-bands may be provided.

And S105, filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data.

In this step, each speech data segment may be filtered using a corresponding subband filtering algorithm. Specifically, each speech data segment may be filtered using a gaussian band-pass filter for the corresponding sub-band.

And S106, performing voice recognition on the optimized voice data.

In this step, existing common speech recognition algorithms can be used for speech recognition.

It is understood that in order to improve recognition accuracy, different speech recognition databases may be set according to users of different age groups to improve matching accuracy and efficiency.

Referring to fig. 2, fig. 2 is a flowchart of a speech recognition method according to a second embodiment of the present invention.

The speech recognition method of the embodiment comprises the following steps:

s201, voice data of a user and age information of the user are obtained.

In one embodiment, the step S201 may include:

s2011, voice data of a user is acquired;

s2012, acquiring the registration information of the user to acquire the age information of the user.

For example, the user may register personal information in a preset application program, upload voice through the application program, or send voice collection prompt information to the user through the application program, and then collect voice in real time through the microphone. The personal information includes information such as age information, sex, and contact information. The age information of the user can be obtained by acquiring personal information on the application program.

Of course, in other embodiments, the eyeball information of the user may also be collected by the camera, and the age of the user may be obtained by comparing the eyeball information with the preset eyeball information list. Eyeball information corresponding to different age groups is stored in the preset eyeball information list.

And S202, determining the frequency range information of the voice of the user according to the age information.

And S203, dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information.

And S204, setting a filtering algorithm corresponding to each sub-frequency band.

And the filtering algorithm corresponding to each sub-band is executed by adopting a Gaussian band-pass filter corresponding to the sub-band. Of course, in one embodiment, a gaussian band-pass filter with multiple sub-bands may be provided.

S205, dividing the voice data into a plurality of voice data segments, wherein each voice data segment is positioned in one of the plurality of sub-frequency bands;

in this step, the voice data may be divided into a plurality of voice data segments according to the band information of the voice data.

S206, filtering the voice data segments by adopting a corresponding filtering algorithm according to the sub-band corresponding to each voice data segment to obtain the filtered optimized voice data;

and acquiring a sub-band corresponding to the voice data segment, and filtering the voice data segment by adopting a filtering algorithm corresponding to the sub-band to obtain the filtered optimized voice data.

And S207, performing voice recognition on the optimized voice data.

It can be understood that, in order to improve the recognition accuracy, the invention sets different voice recognition databases according to users of different age groups so as to improve the matching accuracy and efficiency.

In some embodiments, the step S207 may include:

s2071, acquiring a corresponding voice recognition database according to the age information of the user;

s2072, performing voice recognition on the optimized voice data according to the voice recognition database.

Specifically, in some embodiments, the step S2072 may specifically be:

(1) extracting the voice feature of the voice data of each sub-frequency band by using a Gaussian band-pass filter corresponding to each sub-frequency band;

(2) and extracting the amplitude spectrum of the voice characteristic, training a convolutional neural network by using the amplitude spectrum larger than a threshold value, and performing voice recognition on the optimized voice data by using the trained convolutional neural network and based on the corresponding voice recognition database.

Therefore, the voice data of the user and the age information of the user are obtained; determining the frequency range information of the voice of the user according to the age information; dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information; setting a filtering algorithm corresponding to each sub-frequency band; dividing the voice data into a plurality of voice data segments, wherein each voice data segment is positioned in one of the plurality of sub-frequency bands, filtering the voice data segments by adopting a corresponding filtering algorithm according to the sub-frequency band corresponding to each voice data segment to obtain filtered optimized voice data, and performing voice recognition on the optimized voice data; on the basis of the previous embodiment, the voice data is divided into a plurality of voice data sections according to the frequency range, and then the voice data sections are filtered according to the filtering algorithm corresponding to each voice data section, so that the accuracy of voice recognition is further improved.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a speech recognition device according to a third embodiment of the present invention.

Thespeech recognition apparatus 20 includes: a first obtainingmodule 21, a second obtainingmodule 22, a dividingmodule 23, asetting module 24, afiltering module 25 and avoice recognition module 26.

The first obtainingmodule 21 is used for obtaining voice data of a user and age information of the user.

With reference to fig. 4, the first obtainingmodule 21 includes: afirst acquisition unit 211 and asecond acquisition unit 212;

afirst acquisition unit 211 for acquiring voice data of a user;

a second obtainingunit 212, configured to obtain registration information of the user to obtain age information of the user.

(II) the second obtainingmodule 22 is used for determining the frequency range information of the voice of the user according to the age information.

And (iii) thedividing module 23 is configured to divide the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information.

And (iv) thesetting module 24 is configured to set a filtering algorithm corresponding to each of the sub-bands.

Thefiltering module 25 is configured to filter the voice data of the corresponding sub-band by using a corresponding filtering algorithm to obtain the filtered optimized voice data;

each speech data segment may be filtered using a corresponding subband filtering algorithm. Specifically, each speech data segment may be filtered using a gaussian band-pass filter for the corresponding sub-band.

With reference to fig. 4, in an embodiment, thefiltering module 25 may include: a dividingunit 251 and afiltering unit 252.

A dividingunit 251, configured to divide the voice data into a plurality of voice data segments, where each voice data segment is located in one of the plurality of sub-bands;

thefiltering unit 252 is configured to filter the voice data segments by using a corresponding filtering algorithm according to the sub-band corresponding to each voice data segment, so as to obtain the filtered optimized voice data.

The filtering algorithm corresponding to each sub-band in thefiltering unit 252 is performed by using a gaussian band-pass filter corresponding to the sub-band.

And (six) thevoice recognition module 26 is used for performing voice recognition on the optimized voice data.

In one embodiment, thespeech recognition module 26 is configured to perform speech recognition using conventional speech recognition algorithms.

Referring to fig. 4, thespeech recognition module 26 may include: a database acquisition unit 261 and a speech recognition unit 262.

A database obtaining unit 261, configured to obtain a corresponding voice recognition database according to the age information of the user;

a speech recognition unit 262, configured to perform speech recognition on the optimized speech data according to the speech recognition database.

The speech recognition unit 262 is specifically configured to: extracting the voice feature of the voice data of each sub-frequency band by using a Gaussian band-pass filter corresponding to each sub-frequency band; and extracting the amplitude spectrum of the voice features, and training a convolutional neural network by using the amplitude spectrum larger than a threshold value so as to perform voice recognition on the optimized voice data by using the trained convolutional neural network.

Therefore, the voice data of the user and the age information of the user are obtained; determining the frequency range information of the voice of the user according to the age information; dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information; setting a filtering algorithm corresponding to each sub-frequency band; filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data; performing voice recognition on the optimized voice data; because the voice data is filtered according to the age information and then subjected to voice recognition, the accuracy of the voice recognition is improved, and the voice recognition efficiency is improved.

The invention also provides an electronic device comprising a processor and a memory. The processor is electrically connected with the memory. The processor is a control center of the electronic equipment, connects various parts of the whole server by using various interfaces and lines, executes various functions of the electronic equipment and processes data by running or calling a computer program stored in the memory and calling data stored in the memory, thereby integrally monitoring the electronic equipment.

In this embodiment, a processor of the electronic device loads instructions corresponding to processes of one or more computer programs into a memory according to the following steps, and the processor executes the computer programs stored in the memory, so as to implement various functions: acquiring voice data of a user and age information of the user; determining the frequency range information of the voice of the user according to the age information; dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information; setting a filtering algorithm corresponding to each sub-frequency band; filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data; and performing voice recognition on the optimized voice data.

The present invention also provides a storage medium having a computer program stored therein, which, when run on a computer, causes the computer to perform the method of any of the above embodiments, thereby implementing various functions: acquiring voice data of a user and age information of the user; determining the frequency range information of the voice of the user according to the age information; dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information; setting a filtering algorithm corresponding to each sub-frequency band; filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data; and performing voice recognition on the optimized voice data.

In the description herein, references to the description of the terms "one embodiment," "certain embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

In summary, although the present invention has been described with reference to the preferred embodiments, the above-described preferred embodiments are not intended to limit the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, therefore, the scope of the present invention shall be determined by the appended claims.

Claims

1. A speech recognition method, comprising the steps of:

acquiring voice data of a user and age information of the user;

setting a filtering algorithm corresponding to each sub-frequency band;

and performing voice recognition on the optimized voice data.

2. The voice recognition method of claim 1, wherein the step of obtaining voice data of the user and age information of the user comprises:

acquiring voice data of a user;

and acquiring the registration information of the user to acquire the age information of the user.

3. The speech recognition method of claim 1, wherein the step of filtering the speech data of the corresponding sub-band by using the corresponding filtering algorithm to obtain the filtered optimized speech data comprises:

dividing the voice data into a plurality of voice data segments, each of the voice data segments being located within one of the plurality of sub-bands;

and filtering the voice data segments by adopting a corresponding filtering algorithm according to the sub-frequency band corresponding to each voice data segment to obtain the filtered optimized voice data.

4. The speech recognition method of claim 1, wherein the filtering algorithm corresponding to each sub-band is performed by using a gaussian band-pass filter corresponding to the sub-band.

5. The speech recognition method of claim 1, wherein the step of performing speech recognition on the optimized speech data comprises:

acquiring a corresponding voice recognition database according to the age information of the user;

and performing voice recognition on the optimized voice data according to the voice recognition database.

6. The speech recognition method of claim 5, wherein the step of performing speech recognition on the optimized speech data according to the speech recognition database comprises:

extracting the voice feature of the voice data of each sub-frequency band by using a Gaussian band-pass filter corresponding to each sub-frequency band;

and extracting the amplitude spectrum of the voice characteristic, and training a convolutional neural network by using the amplitude spectrum larger than a threshold value so as to perform voice recognition on the optimized voice data by using the trained convolutional neural network.

7. A speech recognition apparatus, comprising:

8. The speech recognition device of claim 7, wherein the first obtaining module comprises:

the first acquisition unit is used for acquiring voice data of a user;

and the second acquisition unit is used for acquiring the registration information of the user so as to acquire the age information of the user.

9. The speech recognition device of claim 7, wherein the filtering module comprises:

a dividing unit, configured to divide the voice data into a plurality of voice data segments, where each voice data segment is located in one of the plurality of sub-bands;

and the filtering unit is used for filtering the voice data segments by adopting a corresponding filtering algorithm according to the sub-frequency band corresponding to each voice data segment so as to obtain the filtered optimized voice data.

10. An electronic device, characterized in that it comprises a processor and a memory, in which a computer program is stored, the processor being adapted to execute the speech recognition method of any one of claims 1 to 6 by calling the computer program stored in the memory.