Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; may be mechanically connected, may be electrically connected or may be in communication with each other; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, "above" or "below" a first feature means that the first and second features are in direct contact, or that the first and second features are not in direct contact but are in contact with each other via another feature therebetween. Also, the first feature being "on," "above" and "over" the second feature includes the first feature being directly on and obliquely above the second feature, or merely indicating that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature includes the first feature being directly under and obliquely below the second feature, or simply meaning that the first feature is at a lesser elevation than the second feature.
The following disclosure provides many different embodiments or examples for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Of course, they are merely examples and are not intended to limit the present invention. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples, such repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. In addition, the present invention provides examples of various specific processes and materials, but one of ordinary skill in the art may recognize applications of other processes and/or uses of other materials.
Referring to fig. 1, fig. 1 is a flowchart illustrating a speech recognition method according to an embodiment of the present invention.
The speech recognition method of the embodiment comprises the following steps:
s101, voice data of a user and age information of the user are obtained.
The voice data may be voice uploaded by a user or voice collected in real time, and the voice data is voice of a user speaking so as to be distinguished from non-user voices such as environmental sounds or animal sounds.
And S102, determining the frequency range information of the voice of the user according to the age information.
Wherein, the age information of the user can be inquired in the registration information.
For example, users are roughly classified into children, young adults, middle-aged adults, and elderly adults, and the users are respectively associated with different age groups. Each age group of speech corresponds to a different frequency range. Such as children, young adults, middle-aged adults, and elderly people, are in different frequency ranges.
And S103, dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information.
Wherein, the voice of each age group corresponds to different frequency range information. And the frequency bandwidth of the speech of people of different ages is different, and the sub-frequency band can be divided according to the total width of the frequency band.
And S104, setting a filtering algorithm corresponding to each sub-frequency band.
For example, a voice data optimization table may be preset, where multiple sub-bands are stored in the voice data optimization table, and the voice data of different sub-bands correspond to different filtering algorithms.
And the filtering algorithm corresponding to each sub-band is executed by adopting a Gaussian band-pass filter corresponding to the sub-band. Of course, a gaussian band-pass filter with multiple sub-bands may be provided.
And S105, filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data.
In this step, each speech data segment may be filtered using a corresponding subband filtering algorithm. Specifically, each speech data segment may be filtered using a gaussian band-pass filter for the corresponding sub-band.
And S106, performing voice recognition on the optimized voice data.
In this step, existing common speech recognition algorithms can be used for speech recognition.
It is understood that in order to improve recognition accuracy, different speech recognition databases may be set according to users of different age groups to improve matching accuracy and efficiency.
The invention obtains the voice data of the user and the age information of the user; determining the frequency range information of the voice of the user according to the age information; dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information; setting a filtering algorithm corresponding to each sub-frequency band; filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data; performing voice recognition on the optimized voice data; because the voice data is filtered according to the age information and then subjected to voice recognition, the accuracy of the voice recognition is improved, and the voice recognition efficiency is improved.
Referring to fig. 2, fig. 2 is a flowchart of a speech recognition method according to a second embodiment of the present invention.
The speech recognition method of the embodiment comprises the following steps:
s201, voice data of a user and age information of the user are obtained.
The voice data may be voice uploaded by a user or voice collected in real time, and the voice data is voice of a user speaking so as to be distinguished from non-user voices such as environmental sounds or animal sounds.
In one embodiment, the step S201 may include:
s2011, voice data of a user is acquired;
s2012, acquiring the registration information of the user to acquire the age information of the user.
For example, the user may register personal information in a preset application program, upload voice through the application program, or send voice collection prompt information to the user through the application program, and then collect voice in real time through the microphone. The personal information includes information such as age information, sex, and contact information. The age information of the user can be obtained by acquiring personal information on the application program.
Of course, in other embodiments, the eyeball information of the user may also be collected by the camera, and the age of the user may be obtained by comparing the eyeball information with the preset eyeball information list. Eyeball information corresponding to different age groups is stored in the preset eyeball information list.
And S202, determining the frequency range information of the voice of the user according to the age information.
For example, users are roughly classified into children, young adults, middle-aged adults, and elderly adults, and the users are respectively associated with different age groups. Each age group of speech corresponds to a different frequency range. Such as children, young adults, middle-aged adults, and elderly people, are in different frequency ranges.
And S203, dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information.
Wherein, the voice of each age group corresponds to different frequency range information. And the frequency bandwidth of the speech of people of different ages is different, and the sub-frequency band can be divided according to the total width of the frequency band.
And S204, setting a filtering algorithm corresponding to each sub-frequency band.
For example, a voice data optimization table may be preset, where multiple sub-bands are stored in the voice data optimization table, and the voice data of different sub-bands correspond to different filtering algorithms.
And the filtering algorithm corresponding to each sub-band is executed by adopting a Gaussian band-pass filter corresponding to the sub-band. Of course, in one embodiment, a gaussian band-pass filter with multiple sub-bands may be provided.
S205, dividing the voice data into a plurality of voice data segments, wherein each voice data segment is positioned in one of the plurality of sub-frequency bands;
in this step, the voice data may be divided into a plurality of voice data segments according to the band information of the voice data.
S206, filtering the voice data segments by adopting a corresponding filtering algorithm according to the sub-band corresponding to each voice data segment to obtain the filtered optimized voice data;
and acquiring a sub-band corresponding to the voice data segment, and filtering the voice data segment by adopting a filtering algorithm corresponding to the sub-band to obtain the filtered optimized voice data.
And S207, performing voice recognition on the optimized voice data.
In this step, existing common speech recognition algorithms can be used for speech recognition.
It can be understood that, in order to improve the recognition accuracy, the invention sets different voice recognition databases according to users of different age groups so as to improve the matching accuracy and efficiency.
In some embodiments, the step S207 may include:
s2071, acquiring a corresponding voice recognition database according to the age information of the user;
s2072, performing voice recognition on the optimized voice data according to the voice recognition database.
Specifically, in some embodiments, the step S2072 may specifically be:
(1) extracting the voice feature of the voice data of each sub-frequency band by using a Gaussian band-pass filter corresponding to each sub-frequency band;
(2) and extracting the amplitude spectrum of the voice characteristic, training a convolutional neural network by using the amplitude spectrum larger than a threshold value, and performing voice recognition on the optimized voice data by using the trained convolutional neural network and based on the corresponding voice recognition database.
Therefore, the voice data of the user and the age information of the user are obtained; determining the frequency range information of the voice of the user according to the age information; dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information; setting a filtering algorithm corresponding to each sub-frequency band; dividing the voice data into a plurality of voice data segments, wherein each voice data segment is positioned in one of the plurality of sub-frequency bands, filtering the voice data segments by adopting a corresponding filtering algorithm according to the sub-frequency band corresponding to each voice data segment to obtain filtered optimized voice data, and performing voice recognition on the optimized voice data; on the basis of the previous embodiment, the voice data is divided into a plurality of voice data sections according to the frequency range, and then the voice data sections are filtered according to the filtering algorithm corresponding to each voice data section, so that the accuracy of voice recognition is further improved.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a speech recognition device according to a third embodiment of the present invention.
Thespeech recognition apparatus 20 includes: a first obtainingmodule 21, a second obtainingmodule 22, a dividingmodule 23, asetting module 24, afiltering module 25 and avoice recognition module 26.
The first obtainingmodule 21 is used for obtaining voice data of a user and age information of the user.
The voice data may be voice uploaded by a user or voice collected in real time, and the voice data is voice of a user speaking so as to be distinguished from non-user voices such as environmental sounds or animal sounds.
With reference to fig. 4, the first obtainingmodule 21 includes: afirst acquisition unit 211 and asecond acquisition unit 212;
afirst acquisition unit 211 for acquiring voice data of a user;
a second obtainingunit 212, configured to obtain registration information of the user to obtain age information of the user.
(II) the second obtainingmodule 22 is used for determining the frequency range information of the voice of the user according to the age information.
Wherein, the age information of the user can be inquired in the registration information.
For example, users are roughly classified into children, young adults, middle-aged adults, and elderly adults, and the users are respectively associated with different age groups. Each age group of speech corresponds to a different frequency range. Such as children, young adults, middle-aged adults, and elderly people, are in different frequency ranges.
And (iii) thedividing module 23 is configured to divide the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information.
Wherein, the voice of each age group corresponds to different frequency range information. And the frequency bandwidth of the speech of people of different ages is different, and the sub-frequency band can be divided according to the total width of the frequency band.
And (iv) thesetting module 24 is configured to set a filtering algorithm corresponding to each of the sub-bands.
For example, a voice data optimization table may be preset, where multiple sub-bands are stored in the voice data optimization table, and the voice data of different sub-bands correspond to different filtering algorithms.
And the filtering algorithm corresponding to each sub-band is executed by adopting a Gaussian band-pass filter corresponding to the sub-band. Of course, a gaussian band-pass filter with multiple sub-bands may be provided.
Thefiltering module 25 is configured to filter the voice data of the corresponding sub-band by using a corresponding filtering algorithm to obtain the filtered optimized voice data;
each speech data segment may be filtered using a corresponding subband filtering algorithm. Specifically, each speech data segment may be filtered using a gaussian band-pass filter for the corresponding sub-band.
With reference to fig. 4, in an embodiment, thefiltering module 25 may include: a dividingunit 251 and afiltering unit 252.
A dividingunit 251, configured to divide the voice data into a plurality of voice data segments, where each voice data segment is located in one of the plurality of sub-bands;
thefiltering unit 252 is configured to filter the voice data segments by using a corresponding filtering algorithm according to the sub-band corresponding to each voice data segment, so as to obtain the filtered optimized voice data.
The filtering algorithm corresponding to each sub-band in thefiltering unit 252 is performed by using a gaussian band-pass filter corresponding to the sub-band.
And (six) thevoice recognition module 26 is used for performing voice recognition on the optimized voice data.
In one embodiment, thespeech recognition module 26 is configured to perform speech recognition using conventional speech recognition algorithms.
Referring to fig. 4, thespeech recognition module 26 may include: a database acquisition unit 261 and a speech recognition unit 262.
A database obtaining unit 261, configured to obtain a corresponding voice recognition database according to the age information of the user;
a speech recognition unit 262, configured to perform speech recognition on the optimized speech data according to the speech recognition database.
The speech recognition unit 262 is specifically configured to: extracting the voice feature of the voice data of each sub-frequency band by using a Gaussian band-pass filter corresponding to each sub-frequency band; and extracting the amplitude spectrum of the voice features, and training a convolutional neural network by using the amplitude spectrum larger than a threshold value so as to perform voice recognition on the optimized voice data by using the trained convolutional neural network.
Therefore, the voice data of the user and the age information of the user are obtained; determining the frequency range information of the voice of the user according to the age information; dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information; setting a filtering algorithm corresponding to each sub-frequency band; filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data; performing voice recognition on the optimized voice data; because the voice data is filtered according to the age information and then subjected to voice recognition, the accuracy of the voice recognition is improved, and the voice recognition efficiency is improved.
The invention also provides an electronic device comprising a processor and a memory. The processor is electrically connected with the memory. The processor is a control center of the electronic equipment, connects various parts of the whole server by using various interfaces and lines, executes various functions of the electronic equipment and processes data by running or calling a computer program stored in the memory and calling data stored in the memory, thereby integrally monitoring the electronic equipment.
In this embodiment, a processor of the electronic device loads instructions corresponding to processes of one or more computer programs into a memory according to the following steps, and the processor executes the computer programs stored in the memory, so as to implement various functions: acquiring voice data of a user and age information of the user; determining the frequency range information of the voice of the user according to the age information; dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information; setting a filtering algorithm corresponding to each sub-frequency band; filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data; and performing voice recognition on the optimized voice data.
The present invention also provides a storage medium having a computer program stored therein, which, when run on a computer, causes the computer to perform the method of any of the above embodiments, thereby implementing various functions: acquiring voice data of a user and age information of the user; determining the frequency range information of the voice of the user according to the age information; dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information; setting a filtering algorithm corresponding to each sub-frequency band; filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data; and performing voice recognition on the optimized voice data.
Therefore, the voice data of the user and the age information of the user are obtained; determining the frequency range information of the voice of the user according to the age information; dividing the corresponding frequency band into a plurality of sub-frequency bands according to the frequency range information; setting a filtering algorithm corresponding to each sub-frequency band; filtering the voice data of the corresponding sub-frequency band by adopting a corresponding filtering algorithm to obtain the filtered optimized voice data; performing voice recognition on the optimized voice data; because the voice data is filtered according to the age information and then subjected to voice recognition, the accuracy of the voice recognition is improved, and the voice recognition efficiency is improved.
In the description herein, references to the description of the terms "one embodiment," "certain embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
In summary, although the present invention has been described with reference to the preferred embodiments, the above-described preferred embodiments are not intended to limit the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, therefore, the scope of the present invention shall be determined by the appended claims.