In formula (1), x_n Representing the nth pronunciation unit, i.e. x, in the target pronunciation unit set_n Is the pronunciation unit that is currently identified. P (x)_n ) Acoustic score, P, representing the current identified pronunciation unit_prior (x_n ) A probability that the acoustic score representing the currently identified pronunciation unit is less than the preset acoustic score threshold,

the first average value is represented, and alpha and beta represent weight coefficients, which can be obtained according to statistics in the historical audio identification process. />

Representing the compensated acoustic score, P (x'_n ) Representing the compensated acoustic score of the currently identified pronunciation unit.

Optionally, before step S103, the method further includes the following step S71.

S71, after the acoustic recognition processing is completed, determining whether the target sound unit set meets an acoustic compensation condition according to the acoustic score of each sound unit in the target sound unit set, and if so, executing step S103.

In step s71, if the target sound unit set does not meet the acoustic compensation condition, it indicates that the accuracy of the target sound unit set is relatively low, and if the target sound unit set is still subjected to acoustic compensation processing at this time, it is easy to use candidate text information corresponding to the target sound unit set with low accuracy as a recognition result, and the accuracy of audio recognition is reduced. Accordingly, the audio recognition apparatus may perform the acoustic compensation process on the target sound unit set when the target sound unit set satisfies the acoustic compensation condition. Specifically, the audio recognition device may determine, after the acoustic recognition process is completed, whether the target sound unit set meets an acoustic compensation condition according to the acoustic score of each sound unit in the target sound unit set. After the acoustic features are identified, judging whether the target sound unit set meets acoustic compensation conditions according to the acoustic scores of the sound units in the target sound unit set; if yes, the step S103 is executed, which indicates that there are pronunciation units with insufficient pronunciation or inaccurate pronunciation in the target pronunciation unit set; if not, the target pronunciation unit set is not subjected to acoustic compensation processing. The acoustic score of the sound unit in the target sound unit set is compensated only when the target sound unit set meets the acoustic compensation condition, so that the problem that the acoustic score of the sound unit is low due to inaccurate sound generation or insufficient sound generation of the sound unit can be avoided, the target sound unit set which does not meet the acoustic compensation condition is prevented from being subjected to acoustic compensation, and the accuracy and the effectiveness of the acoustic compensation on the target sound unit set are improved.

In this embodiment, step s71 includes the following steps s81 to s84.

And S81, after the acoustic recognition processing is finished, detecting whether the target pronunciation unit which is the same as a first pronunciation unit exists in the target pronunciation unit set, wherein the first pronunciation unit is a pronunciation unit to be compensated which is obtained through statistics in the historical audio recognition process.

And S82, if so, verifying whether the acoustic score of the target pronunciation unit is smaller than a preset acoustic score threshold value.

And s83, if the number of the sound units is smaller than the preset sound score threshold, counting the number of all sound units with the acoustic scores larger than the preset sound score threshold in the target sound unit set.

s84, if the counted number is greater than the third number threshold, determining that the target pronunciation unit set meets the acoustic compensation condition.

In steps s81 to s84, the audio recognition apparatus may detect whether the target sound unit set satisfies the acoustic compensation condition in combination with the history empirical data after the acoustic recognition process is completed. Specifically, the audio recognition device may detect whether the target pronunciation unit identical to the first pronunciation unit exists in the target pronunciation unit set after the acoustic recognition processing is completed, and if the target pronunciation unit exists, it is verified whether the acoustic score of the target pronunciation unit is less than a preset acoustic score threshold value if the target pronunciation unit is a pronunciation unit that is easy to be inadequately pronounced or inadequately pronounced. If the number of the sound units is smaller than the preset sound score threshold, the sound score of the target sound unit is lower, and the number of all sound units with the sound scores larger than the preset sound score threshold in the target sound unit set is counted. If the counted number is larger than a third number threshold, the method indicates that the acoustic score of most pronunciation units in the target pronunciation unit set is higher, and the acoustic score of a few pronunciation units is lower, namely, only the target pronunciation units in the target pronunciation unit set are inadequately pronounced or the accuracy of the pronounced target pronunciation units is lower; that is, the accuracy of the target pronunciation unit set is relatively high, and only a few pronunciation units are inadequately pronounced or are inadequately pronounced; it is determined that the set of target pronunciation units satisfies the acoustic compensation condition. Therefore, the acoustic compensation processing of the target sound unit set with lower accuracy can be avoided, and the accuracy of the acoustic compensation processing of the target sound unit set is improved. It should be noted that the preset acoustic score threshold, the first number threshold, the second number threshold, the third number threshold and the fourth number threshold may be obtained by counting historical audio recognition, where the first number threshold, the second number threshold, the third number threshold and the fourth number threshold may be specifically dynamically adjusted according to the number of the pronunciation words in the target pronunciation unit set.

In this embodiment, step s71 includes the following steps s91 to s93.

And S91, judging whether a target pronunciation unit with the acoustic score smaller than a preset acoustic score threshold exists in the target pronunciation unit set after the acoustic identification processing is completed.

s92, if the target pronunciation unit set exists, counting the number of all pronunciation units with the acoustic score larger than the preset acoustic score threshold value in the target pronunciation unit set.

s93, if the counted number is greater than the fourth number threshold, determining that the target pronunciation unit set meets the acoustic compensation condition.

In steps s91 to s93, the audio recognition apparatus may detect whether the target sound unit set satisfies the acoustic compensation condition after the acoustic recognition process is completed. Specifically, after the acoustic identification process is completed, whether a target pronunciation unit with an acoustic score smaller than a preset acoustic score threshold exists in the target pronunciation unit set is judged, and if the target pronunciation unit with the acoustic score smaller than the preset acoustic score threshold is indicated, the number of all pronunciation units with the acoustic score larger than the preset acoustic score threshold in the target pronunciation unit set is counted. If the counted number is larger than a third number threshold, the method indicates that the acoustic score of most pronunciation units in the target pronunciation unit set is higher, and the acoustic score of a few pronunciation units is lower, namely, only the target pronunciation units in the target pronunciation unit set are inadequately pronounced or the accuracy of the pronounced target pronunciation units is lower; that is, the accuracy of the target pronunciation unit set is relatively high, and only a few pronunciation units are inadequately pronounced or are inadequately pronounced; it is determined that the set of target pronunciation units satisfies the acoustic compensation condition. Therefore, the acoustic compensation processing of the target sound unit set with lower accuracy can be avoided, and the accuracy of the acoustic compensation processing of the target sound unit set is improved.

In this embodiment, step S103 includes the following steps S111 to S112.

And s111, performing acoustic compensation processing on the acoustic score of the target pronunciation unit by adopting the acoustic scores of other pronunciation units except the target pronunciation unit in the target pronunciation unit set, and obtaining the compensated acoustic score of the target pronunciation unit.

And s112, updating the target pronunciation unit set by adopting the compensated acoustic score of the target pronunciation unit to obtain the target pronunciation unit set after acoustic compensation processing.

In steps s111 to s112, after the acoustic recognition processing is completed, when it is detected that the target sound unit set satisfies the acoustic compensation condition, the acoustic compensation processing may be performed on the target sound unit set. Specifically, the acoustic scores of the target pronunciation units can be compensated by using the acoustic scores of other pronunciation units in the target pronunciation unit set, so as to obtain the compensated acoustic score of the target pronunciation unit. In one embodiment, the average acoustic score and the maximum acoustic score of the acoustic scores of the other pronunciation units in the target pronunciation unit set except the target pronunciation unit may be used, and the acoustic score of the target pronunciation unit after being compensated may be obtained by performing acoustic compensation processing on the acoustic score of the target pronunciation unit. In another embodiment, an acoustic score may be randomly selected from the acoustic scores of the other pronunciation units in the target pronunciation unit set, and the acoustic score of the target pronunciation unit may be subjected to acoustic compensation processing, so as to obtain the compensated acoustic score of the target pronunciation unit. Further, the target pronunciation unit set is updated by adopting the compensated acoustic score of the target pronunciation unit, and the target pronunciation unit set after acoustic compensation processing is obtained. Here, only the pronunciation units which are insufficiently or inaccurately pronounced in the target pronunciation unit set are compensated, so that the acoustic score of the pronunciation units in the target pronunciation unit set can be improved, and the accuracy of acoustic compensation of the target pronunciation unit set can be improved.

In this embodiment, step s111 includes the following steps s211 to s214.

s211, calculating a second average value of acoustic scores of other pronunciation units except the target pronunciation unit in the target pronunciation unit set.

s212, obtaining the probability that the acoustic score of the target pronunciation unit is smaller than the preset acoustic score threshold value.

s213, determining a compensated acoustic score for the target pronunciation unit according to the second average and the probability.

s214, determining the sum of the acoustic score of the target pronunciation unit and the compensated acoustic score as the compensated acoustic score of the target pronunciation unit.

In steps s211 to s214, the audio recognition device may perform acoustic compensation processing on the acoustic score of the target pronunciation unit by using an average value of the acoustic scores of the pronunciation units other than the target pronunciation unit in the target pronunciation unit set, to obtain the compensated acoustic score of the target pronunciation unit. Specifically, the audio recognition device may calculate the second average value of the acoustic scores of the sound units other than the target sound unit in the target sound unit set by using a preset average algorithm. Further, obtaining the probability that the acoustic score of the target pronunciation unit is smaller than the preset acoustic score threshold value, and determining the compensation acoustic score of the target pronunciation unit according to the second average value and the probability; and determining the sum of the acoustic score of the target pronunciation unit and the compensated acoustic score as the compensated acoustic score of the target pronunciation unit. The compensated acoustic score of the target sound unit can be expressed by the following formula (2).

In formula (2), x_i Representing the ith pronunciation element, i.e. x, in the target collection of pronunciation elements_i Is the target pronunciation unit. P (x)_i ) Representing the acoustic score, P, of the target pronunciation unit_prior (x_i ) A probability that the acoustic score representing the target pronunciation unit is less than the preset acoustic score threshold,

representing a second average value, N representing the number of pronunciation units in the target pronunciation unit. />

Representing the acoustic compensation score. P (x'_i ) Representing the compensated acoustic score of the target pronunciation unit.

In one embodiment, step S104 includes the following steps S311-S313.

And s311, performing text recognition on the target pronunciation unit set after the acoustic compensation processing to obtain candidate text information corresponding to the pronunciation data and the language score of the candidate text information.

s312, determining the acoustic score of the target sound unit set according to the acoustic score of each sound unit in the target sound unit set after the acoustic compensation processing.

s313, if the sum of the acoustic score and the language score of the candidate text information is greater than a preset score threshold, determining the candidate text information as the text information corresponding to the pronunciation data.

In steps s311 to s313, the audio recognition apparatus may query candidate text information corresponding to the target pronunciation unit set after the acoustic compensation processing through the pronunciation dictionary, and calculate a language score of the candidate text information according to the language model. And determining the product of the acoustic scores of the sound units in the target sound unit set after the acoustic compensation processing as the acoustic score of the target sound unit set. And if the sum of the acoustic score and the language score of the target pronunciation unit set is greater than a preset score threshold, determining the candidate text information as the text information corresponding to the pronunciation data.

In one embodiment, step S104 is followed by the following steps S411-S413.

s411, detecting whether the text information corresponding to the pronunciation data includes a field matched with the operation instruction.

And s412, if yes, generating a target operation instruction according to the text information corresponding to the pronunciation data.

s413, the target operation instruction is sent to the terminal, and the terminal executes the target operation instruction.

In steps s411 to s413, the audio recognition apparatus may generate an operation instruction from the text information corresponding to the pronunciation data. Specifically, it may be detected whether the text information corresponding to the pronunciation data includes a field matched with the operation instruction, for example, the field may include "open", "close", "start", and the like. If so, the audio identification equipment can generate a target operation instruction according to the text information corresponding to the pronunciation data, and when the audio identification equipment is a server, the server can send the target operation instruction to the terminal, and the terminal executes the target operation instruction; when the audio recognition device is a terminal, the terminal may execute the target operation instruction.

The audio recognition method provided by the application can be applied to scenes such as automatic translation, voice search, voice input, voice dialogue and the like, and the application is described in detail by taking the method as an example of a server for audio recognition equipment. Referring to fig. 6, fig. 6 is an audio recognition method provided in the present application.

As shown in fig. 6, the terminal includes asearch interface 12, where thesearch interface 12 includes anaudio control 13 and atext input box 14, and the search interface may be a browser, a user interface of a social application program, and so on; the text entry box allows a user to enter text information to be searched. When the terminal detects the clicking operation on theaudio control 13, the terminal can acquire the audio data input by the user through the voice device and send the audio data to the server.

As shown in fig. 6, the server may obtain a set of acoustic features corresponding to the pronunciation data. Specifically, filtering processing can be performed on pronunciation data to obtain processed pronunciation data; and carrying out frame processing on the processed pronunciation data to obtain multi-frame pronunciation sub-data. Further, frequency domain transformation is carried out on each frame of pronunciation sub data in the multi-frame pronunciation sub data to obtain frequency domain pronunciation sub data, and feature extraction is carried out on each frame of pronunciation sub data in the frequency domain to obtain an acoustic feature set corresponding to the pronunciation data. The acoustic feature set comprises a plurality of acoustic features, the acoustic features in the acoustic feature set are arranged in sequence, and each acoustic feature corresponds to one frame of pronunciation sub-data.

As shown in fig. 6, the server may perform acoustic recognition processing on the acoustic feature set of the pronunciation data to obtain a plurality of candidate pronunciation unit sets corresponding to the pronunciation data, where each candidate pronunciation unit set includes a plurality of pronunciation units and acoustic scores of each pronunciation unit, and here, taking three candidate pronunciation unit sets as an example, a candidate pronunciation unit set 1, a candidate pronunciation unit set 2, and a candidate pronunciation unit set 3 respectively. And after the acoustic recognition processing is completed, detecting whether each candidate unit set meets the acoustic compensation condition according to the acoustic scores of the pronunciation units in each candidate pronunciation unit set. If the acoustic score of each sound unit in the candidate sound unit set 2 is detected to be smaller than the preset acoustic score threshold value, determining that the candidate sound unit set 2 does not meet the acoustic compensation condition. If the acoustic score of each pronunciation unit in the candidate pronunciation unit set 3 is detected to be greater than or equal to the preset acoustic score threshold value, determining that the candidate pronunciation unit set 3 does not meet the acoustic compensation condition. If it is detected that the target pronunciation units with the acoustic scores smaller than the preset acoustic score threshold value exist in the candidate pronunciation unit set 1, and the number of all pronunciation units with the acoustic scores larger than the preset acoustic score threshold value in the candidate pronunciation unit set 1 is larger than the fourth number threshold value, it is determined that the candidate pronunciation unit set 1 does not meet the acoustic compensation condition. For example, the candidate sound unit set 1 includes sound units n, e, k, s, t, and if it is detected that the acoustic score of the sound unit t is smaller than the preset acoustic score threshold, the acoustic scores of the other sound units are all greater than or equal to the preset acoustic score threshold, it is determined that the candidate sound unit set 1 satisfies the acoustic compensation condition. Further, an average value of the acoustic scores of the sound unit n, e, k, s may be calculated, and the acoustic score of the sound unit t may be subjected to acoustic compensation processing based on the average value. And finally, respectively carrying out text recognition on the sound unit set 1 and the candidate sound unit set 3 after acoustic compensation processing to obtain candidate text information 1 and candidate text information 2 corresponding to sound data and a decoding score of each candidate text information. The candidate text information with the highest decoding score is selected from the candidate text information 1 and the candidate text information 2 as the text information of the pronunciation data.

As shown in fig. 6, the server may transmit text information corresponding to the pronunciation data to the terminal, e.g., the text information includes next, and the terminal may display the text information in thetext input box 14 in the search interface. Optionally, the server may further generate a search instruction according to the text information, and send the search instruction to the terminal, where the search instruction is used to instruct the terminal to search for an entry associated with the text information. The terminal may receive the search instruction and execute the search instruction and output a plurality of entries associated with the text information.

The embodiment of the application provides an audio recognition device, which can be arranged in audio recognition equipment, for example, the audio recognition device can be a decoder in the audio recognition equipment or an application program with a decoding function; referring to fig. 7, the apparatus includes:

an obtainingunit 701, configured to obtain pronunciation data to be identified, and extract an acoustic feature set of the pronunciation data, where the acoustic feature set includes a plurality of acoustic features;

the identifyingunit 702 is configured to perform acoustic identification processing on the acoustic feature set of the pronunciation data, so as to obtain a target pronunciation unit set corresponding to the pronunciation data, where the target pronunciation unit set includes a plurality of pronunciation units and acoustic scores of each pronunciation unit;

Acompensation unit 703, configured to perform acoustic compensation processing on the acoustic score of each pronunciation unit in the target pronunciation unit set;

therecognition unit 702 is further configured to perform text recognition on the target pronunciation unit set after the acoustic compensation processing to obtain text information corresponding to the pronunciation data.

Optionally, the acquiringunit 701 is configured to sequentially identify each acoustic feature in the acoustic feature set according to an arrangement sequence of each acoustic feature in the acoustic feature set; calculating an acoustic score for the pronunciation unit each time one of the pronunciation units is identified; when each acoustic feature in the acoustic feature set is identified, obtaining the target pronunciation unit set; wherein the order in which each of the target set of pronunciation units is identified corresponds to the pronunciation order of each of the pronunciation units.

Optionally, the determiningunit 704 is configured to determine, during the acoustic recognition processing, whether the target sound unit set meets an acoustic compensation condition according to an acoustic score of each sound unit in the target sound unit set; or after the acoustic recognition processing is finished, judging whether the target sound unit set meets an acoustic compensation condition according to the acoustic score of each sound unit in the target sound unit set; and if so, executing the step of performing acoustic compensation processing on the acoustic score of each pronunciation unit in the target pronunciation unit set.

Optionally, the determiningunit 704 is specifically configured to determine, when one pronunciation unit is identified, whether the currently identified pronunciation unit is a first pronunciation unit, where the first pronunciation unit is a pronunciation unit to be compensated obtained by statistics in the historical audio identification process; if yes, verifying whether the acoustic score of the current recognized pronunciation unit is smaller than a preset acoustic score threshold value; if the number of the pronunciation units in the target pronunciation unit set is smaller than the number of the pronunciation units in the target pronunciation unit set, the pronunciation sequences of which are located before the pronunciation sequence of the current recognized pronunciation unit, and comparing the sizes of the pronunciation units in the target pronunciation unit set, the pronunciation sequences of which are located before the pronunciation sequence of the current recognized pronunciation unit, with the preset acoustic score threshold; and if the counted number is greater than a first number threshold, and the acoustic score of each pronunciation unit, of which the pronunciation sequence is positioned before the pronunciation sequence of the currently identified pronunciation unit, in the target pronunciation unit set is greater than or equal to the preset acoustic score threshold, determining that the target pronunciation unit set meets an acoustic compensation condition.

Optionally, the determiningunit 704 is specifically configured to, for each time a pronunciation unit is identified, verify whether an acoustic score of the currently identified pronunciation unit is smaller than a preset acoustic score threshold; if the number of the pronunciation units in the target pronunciation unit set is smaller than the number of the pronunciation units in the target pronunciation unit set, the pronunciation sequences of which are located before the pronunciation sequence of the current recognized pronunciation unit, and comparing the sizes of the pronunciation units in the target pronunciation unit set, the pronunciation sequences of which are located before the pronunciation sequence of the current recognized pronunciation unit, with the preset acoustic score threshold; and if the counted number is greater than a second number threshold, and the acoustic score of each pronunciation unit, of which the pronunciation sequence is before the pronunciation sequence of the currently identified pronunciation unit, in the target pronunciation unit set is greater than or equal to the preset acoustic score threshold, determining that the target pronunciation unit set meets an acoustic compensation condition.

Optionally, thecompensation unit 703 is specifically configured to perform acoustic compensation processing on the acoustic score of the currently identified pronunciation unit by using the acoustic scores of all pronunciation units in the target pronunciation unit set that have a pronunciation sequence that is before the pronunciation sequence of the currently identified pronunciation unit, so as to obtain the compensated acoustic score of the currently identified pronunciation unit; and updating the target pronunciation unit set by adopting the compensated acoustic score of the currently identified pronunciation unit to obtain a target pronunciation unit set after acoustic compensation processing.

Optionally, thecompensation unit 703 is specifically configured to calculate a first average value of acoustic scores of all pronunciation units in the target pronunciation unit set, where the pronunciation sequence is located before the pronunciation sequence of the currently identified pronunciation unit; acquiring the probability that the acoustic score of the current recognized pronunciation unit is smaller than the preset acoustic score threshold value; determining a compensated acoustic score for the currently identified pronunciation unit based on the first average and the probability; and determining the sum of the acoustic score of the currently identified pronunciation unit and the compensated acoustic score as the acoustic score of the compensated currently identified pronunciation unit.

Optionally, the audio identifying apparatus further includes: the judgingunit 704 is specifically configured to detect whether a target pronunciation unit identical to a first pronunciation unit exists in the target pronunciation unit set after the acoustic recognition processing is completed, where the first pronunciation unit is a pronunciation unit to be compensated obtained by statistics in a historical audio recognition process; if yes, verifying whether the acoustic score of the target pronunciation unit is smaller than a preset acoustic score threshold value; if the acoustic score is smaller than the preset acoustic score threshold, counting the number of all the pronunciation units with acoustic scores larger than or equal to the preset acoustic score threshold in the target pronunciation unit set; and if the counted number is larger than a third number threshold, determining that the target pronunciation unit set meets the acoustic compensation condition.

Optionally, the determiningunit 704 is specifically configured to determine, after the acoustic recognition processing is completed, whether a target pronunciation unit with an acoustic score smaller than a preset acoustic score threshold exists in the target pronunciation unit set;

if the target sound unit set exists, counting the number of all sound units with the acoustic score being greater than or equal to the preset acoustic score threshold value in the target sound unit set; and if the counted number is larger than a fourth number threshold, determining that the target pronunciation unit set meets the acoustic compensation condition.

Optionally, thecompensation unit 703 is specifically configured to perform acoustic compensation processing on the acoustic score of the target pronunciation unit by using the acoustic scores of the pronunciation units other than the target pronunciation unit in the target pronunciation unit set, so as to obtain the compensated acoustic score of the target pronunciation unit; and updating the target pronunciation unit set by adopting the compensated acoustic score of the target pronunciation unit to obtain the target pronunciation unit set after acoustic compensation processing.

Optionally, thecompensation unit 703 is specifically configured to calculate a second average value of acoustic scores of other pronunciation units in the target pronunciation unit set, where the second average value is an acoustic score of the other pronunciation units except the target pronunciation unit; acquiring the probability that the acoustic score of the target pronunciation unit is smaller than the preset acoustic score threshold value; determining a compensated acoustic score for the target pronunciation unit according to the second average and the probability; and determining the sum of the acoustic score of the target pronunciation unit and the compensated acoustic score as the acoustic score of the compensated target pronunciation unit.

Optionally, the identifyingunit 702 is specifically configured to perform text identification on the target pronunciation unit set after the acoustic compensation processing to obtain candidate text information corresponding to the pronunciation data and a language score of the candidate text information; determining the acoustic score of the target sound unit set according to the acoustic score of each sound unit in the target sound unit set after the acoustic compensation processing; and if the sum of the acoustic score and the language score of the target pronunciation unit set is greater than a preset score threshold, determining the candidate text information as the text information corresponding to the pronunciation data.

Optionally, the audio identifying apparatus further includes: a generatingunit 705, configured to detect whether a field matched with an operation instruction is included in text information corresponding to the pronunciation data; if yes, generating a target operation instruction according to the text information corresponding to the pronunciation data, sending the target operation instruction to a terminal, and executing the target operation instruction by the terminal.

An embodiment of the present application provides an audio recognition device, please refer to fig. 8. The audio recognition apparatus includes: theprocessor 151, theuser interface 152, thenetwork interface 154, and thestorage device 155 are connected via thebus 153.

Auser interface 152 for enabling human-machine interaction, which may include a display screen or keyboard, etc. Anetwork interface 154 for communication connection with external devices. Astorage device 155 is coupled to theprocessor 151 for storing various software programs and/or sets of instructions. In particular implementations,storage 155 may include high-speed random access memory, and may also include non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Thestorage 155 may store an operating system (hereinafter referred to as a system), such as ANDROID, IOS, WINDOWS, or an embedded operating system, such as LINUX. Thestorage 155 may also store a network communication program that may be used to communicate with one or more additional devices, one or more application audio recognition devices, and one or more audio recognition devices. Thestorage 155 may also store a user interface program that can vividly display the content image of the application program through a graphical operation interface, and receive control operations of the application program from a user through input controls such as menus, dialog boxes, buttons, and the like. Thestorage 155 may also store acoustic models, language models, pronunciation dictionaries, and the like.

In one embodiment, thestorage 155 may be used to store one or more instructions; theprocessor 151 may invoke the one or more instructions to implement an audio recognition method, specifically, theprocessor 151 invokes the one or more instructions to perform the following steps:

Optionally, the processor invokes an instruction to execute the following steps:

the acoustic recognition processing for the acoustic feature set of the pronunciation data comprises the following steps:

sequentially identifying each acoustic feature in the acoustic feature set according to the arrangement sequence of each acoustic feature in the acoustic feature set;

Calculating an acoustic score for the pronunciation unit each time one of the pronunciation units is identified;

when each acoustic feature in the acoustic feature set is identified, obtaining the target pronunciation unit set;

wherein the order in which each of the target set of pronunciation units is identified corresponds to the pronunciation order of each of the pronunciation units.

in the acoustic recognition processing process, judging whether the target sound unit set meets an acoustic compensation condition according to the acoustic score of each sound unit in the target sound unit set; or after the acoustic recognition processing is finished, judging whether the target sound unit set meets an acoustic compensation condition according to the acoustic score of each sound unit in the target sound unit set;

and if so, executing the step of performing acoustic compensation processing on the acoustic score of each pronunciation unit in the target pronunciation unit set.

judging whether a current recognized pronunciation unit is a first pronunciation unit or not when one pronunciation unit is recognized, wherein the first pronunciation unit is a pronunciation unit to be compensated obtained through statistics in a historical audio recognition process;

If yes, verifying whether the acoustic score of the current recognized pronunciation unit is smaller than a preset acoustic score threshold value;

if the number of the pronunciation units in the target pronunciation unit set is smaller than the number of the pronunciation units in the target pronunciation unit set, the pronunciation sequences of which are located before the pronunciation sequence of the current recognized pronunciation unit, and comparing the sizes of the pronunciation units in the target pronunciation unit set, the pronunciation sequences of which are located before the pronunciation sequence of the current recognized pronunciation unit, with the preset acoustic score threshold;

and if the counted number is greater than a first number threshold, and the acoustic score of each pronunciation unit, of which the pronunciation sequence is positioned before the pronunciation sequence of the currently identified pronunciation unit, in the target pronunciation unit set is greater than or equal to the preset acoustic score threshold, determining that the target pronunciation unit set meets an acoustic compensation condition.

each time a pronunciation unit is identified, verifying whether the acoustic score of the currently identified pronunciation unit is less than a preset acoustic score threshold;

And if the counted number is greater than a second number threshold, and the acoustic score of each pronunciation unit, of which the pronunciation sequence is before the pronunciation sequence of the currently identified pronunciation unit, in the target pronunciation unit set is greater than or equal to the preset acoustic score threshold, determining that the target pronunciation unit set meets an acoustic compensation condition.

performing acoustic compensation processing on the acoustic scores of the current recognized pronunciation units by adopting the acoustic scores of all pronunciation units with pronunciation sequences before the pronunciation sequence of the current recognized pronunciation unit in the target pronunciation unit set to obtain the acoustic scores of the compensated current recognized pronunciation units;

and updating the target pronunciation unit set by adopting the compensated acoustic score of the currently identified pronunciation unit to obtain a target pronunciation unit set after acoustic compensation processing.

calculating a first average value of acoustic scores of all pronunciation units in the target pronunciation unit set, the pronunciation sequences of which are located before the pronunciation sequence of the currently identified pronunciation unit;

Acquiring the probability that the acoustic score of the current recognized pronunciation unit is smaller than the preset acoustic score threshold value;

determining a compensated acoustic score for the currently identified pronunciation unit based on the first average and the probability;

and determining the sum of the acoustic score of the currently identified pronunciation unit and the compensated acoustic score as the acoustic score of the compensated currently identified pronunciation unit.

after the acoustic recognition processing is finished, detecting whether a target pronunciation unit which is the same as a first pronunciation unit exists in the target pronunciation unit set, wherein the first pronunciation unit is a pronunciation unit to be compensated which is obtained through statistics in a historical audio recognition process;

if yes, verifying whether the acoustic score of the target pronunciation unit is smaller than a preset acoustic score threshold value;

if the acoustic score is smaller than the preset acoustic score threshold, counting the number of all the pronunciation units with acoustic scores larger than or equal to the preset acoustic score threshold in the target pronunciation unit set;

and if the counted number is larger than a third number threshold, determining that the target pronunciation unit set meets the acoustic compensation condition.

after the acoustic identification processing is finished, judging whether a target pronunciation unit with an acoustic score smaller than a preset acoustic score threshold exists in the target pronunciation unit set;

if the target sound unit set exists, counting the number of all sound units with the acoustic score being greater than or equal to the preset acoustic score threshold value in the target sound unit set;

and if the counted number is larger than a fourth number threshold, determining that the target pronunciation unit set meets the acoustic compensation condition.

performing acoustic compensation processing on the acoustic scores of the target pronunciation units by adopting the acoustic scores of other pronunciation units except the target pronunciation unit in the target pronunciation unit set to obtain the compensated acoustic score of the target pronunciation unit;

and updating the target pronunciation unit set by adopting the compensated acoustic score of the target pronunciation unit to obtain the target pronunciation unit set after acoustic compensation processing.

calculating a second average value of acoustic scores of other pronunciation units except the target pronunciation unit in the target pronunciation unit set;

Acquiring the probability that the acoustic score of the target pronunciation unit is smaller than the preset acoustic score threshold value;

determining a compensated acoustic score for the target pronunciation unit according to the second average and the probability;

and determining the sum of the acoustic score of the target pronunciation unit and the compensated acoustic score as the acoustic score of the compensated target pronunciation unit.

performing text recognition on the target pronunciation unit set after the acoustic compensation processing to obtain candidate text information corresponding to the pronunciation data and a language score of the candidate text information;

determining the acoustic score of the target sound unit set according to the acoustic score of each sound unit in the target sound unit set after the acoustic compensation processing;

and if the sum of the acoustic score and the language score of the target pronunciation unit set is greater than a preset score threshold, determining the candidate text information as the text information corresponding to the pronunciation data.

detecting whether text information corresponding to the pronunciation data comprises a field matched with an operation instruction or not;

If yes, generating a target operation instruction according to the text information corresponding to the pronunciation data, sending the target operation instruction to a terminal, and executing the target operation instruction by the terminal.

The embodiment of the present application further provides a computer readable storage medium, on which a computer program is stored, and the implementation and beneficial effects of the program for solving the problem may be referred to the implementation and beneficial effects of an audio recognition method described in fig. 2, and the repetition is omitted.

The foregoing disclosure is only illustrative of some of the embodiments of the present application and is not, of course, to be construed as limiting the scope of the appended claims, and therefore, all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A method of audio recognition, the method comprising:

acquiring pronunciation data to be identified, and extracting an acoustic feature set of the pronunciation data, wherein the acoustic feature set comprises a plurality of acoustic features, and one acoustic feature of the plurality of acoustic features corresponds to one frame of pronunciation sub-data in the pronunciation data;

performing acoustic compensation processing on acoustic scores of all pronunciation units in the target pronunciation unit set; the performing acoustic compensation processing on the acoustic score of each pronunciation unit in the target pronunciation unit set includes: performing acoustic compensation processing on the acoustic scores of the current recognized pronunciation units by adopting the acoustic scores of all pronunciation units with pronunciation sequences before the pronunciation sequence of the current recognized pronunciation unit in the target pronunciation unit set to obtain the acoustic scores of the compensated current recognized pronunciation units; updating the target pronunciation unit set by adopting the compensated acoustic score of the currently identified pronunciation unit to obtain a target pronunciation unit set after acoustic compensation processing; the current recognized pronunciation unit is a current recognized pronunciation unit in the target pronunciation unit set; or, performing acoustic compensation processing on the acoustic scores of the target pronunciation units by adopting the acoustic scores of other pronunciation units except the target pronunciation unit in the target pronunciation unit set to obtain the compensated acoustic score of the target pronunciation unit; updating the target pronunciation unit set by adopting the compensated acoustic score of the target pronunciation unit to obtain an acoustic compensation processed target pronunciation unit set; the target pronunciation unit is a pronunciation unit with an acoustic score smaller than a preset acoustic score threshold value in the target pronunciation unit set;

2. The method of claim 1, wherein the pronunciation unit includes a plurality of pronunciation states, each pronunciation state corresponding to an acoustic feature; each acoustic feature in the acoustic feature set of the pronunciation data is arranged in sequence;

3. The method of claim 2, wherein the method further comprises:

4. The method of claim 3, wherein the determining, during the acoustic recognition process, whether the set of target sound units meets an acoustic compensation condition based on the acoustic scores of the respective sound units in the set of target sound units comprises:

5. The method of claim 3, wherein the determining, during the acoustic recognition process, whether the set of target sound units meets an acoustic compensation condition based on the acoustic scores of the respective sound units in the set of target sound units comprises:

6. The method of claim 1, wherein the performing acoustic compensation processing on the acoustic score of the currently identified sound unit using the acoustic scores of all sound units in the target sound unit set whose sound order is before the sound order of the currently identified sound unit to obtain the compensated acoustic score of the currently identified sound unit comprises:

acquiring the probability that the acoustic score of the current recognized pronunciation unit is smaller than a preset acoustic score threshold value;

7. The method of claim 3, wherein after the acoustic recognition process is completed, determining whether the set of target sound units meets an acoustic compensation condition based on the acoustic score of each sound unit in the set of target sound units comprises:

8. The method of claim 7, wherein after the acoustic recognition process is completed, determining whether the set of target sound units meets an acoustic compensation condition based on the acoustic score of each sound unit in the set of target sound units comprises:

9. The method of claim 8, wherein the performing acoustic compensation processing on the acoustic score of the target sound unit using the acoustic scores of the sound units other than the target sound unit in the target sound unit set to obtain the compensated acoustic score of the target sound unit comprises:

10. The method of claim 1, wherein performing text recognition on the target pronunciation unit set after the acoustic compensation processing to obtain text information corresponding to the pronunciation data comprises:

11. The method of claim 1, wherein the method further comprises:

if yes, generating a target operation instruction according to text information corresponding to the pronunciation data, sending the target operation instruction to a terminal, and executing the target operation instruction by the terminal.

12. An audio recognition device, the device comprising:

an acquisition unit, configured to acquire pronunciation data to be identified, and extract an acoustic feature set of the pronunciation data, where the acoustic feature set includes a plurality of acoustic features, and one acoustic feature of the plurality of acoustic features corresponds to one frame of pronunciation sub-data in the pronunciation data;

the compensation unit is used for carrying out acoustic compensation processing on the acoustic score of each pronunciation unit in the target pronunciation unit set; the performing acoustic compensation processing on the acoustic score of each pronunciation unit in the target pronunciation unit set includes: performing acoustic compensation processing on the acoustic scores of the current recognized pronunciation units by adopting the acoustic scores of all pronunciation units with pronunciation sequences before the pronunciation sequence of the current recognized pronunciation unit in the target pronunciation unit set to obtain the acoustic scores of the compensated current recognized pronunciation units; updating the target pronunciation unit set by adopting the compensated acoustic score of the currently identified pronunciation unit to obtain a target pronunciation unit set after acoustic compensation processing; the current recognized pronunciation unit is a current recognized pronunciation unit in the target pronunciation unit set; or, performing acoustic compensation processing on the acoustic scores of the target pronunciation units by adopting the acoustic scores of other pronunciation units except the target pronunciation unit in the target pronunciation unit set to obtain the compensated acoustic score of the target pronunciation unit; updating the target pronunciation unit set by adopting the compensated acoustic score of the target pronunciation unit to obtain an acoustic compensation processed target pronunciation unit set; the target pronunciation unit is a pronunciation unit with an acoustic score smaller than a preset acoustic score threshold value in the target pronunciation unit set;

The recognition unit is further used for performing text recognition on the target pronunciation unit set after the acoustic compensation processing to obtain text information corresponding to the pronunciation data.

13. An audio recognition apparatus, comprising:

a computer storage medium storing one or more instructions adapted to be loaded by the processor and to perform the method of any one of claims 1-11.

14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1-11.