Electronic product voice control system for restraining response time delayTechnical Field
The invention belongs to the technical field of voice control, and particularly relates to a voice control system of an electronic product for restraining response time delay.
Background
With the continuous development of computer technology, electronic products are widely used in daily life, and by combining with a voice recognition technology, the electronic products can be controlled by voice to perform corresponding actions, so that the use of the electronic products is further facilitated.
At present, patent No. CN201510989140.0 discloses a voice control device, including: the voice acquisition module is used for receiving a voice signal; the voice recognition module is used for generating voice characteristics according to the voice signals, judging the voice characteristics according to the current working mode of the voice control device and generating a voice command when judging that the voice characteristics are matched with a voice template corresponding to the current working mode; the first communication module is used for carrying out wireless communication with the intelligent terminal; and the control module is used for generating a control instruction according to the voice command and sending the control instruction to the intelligent terminal through the first wireless communication module so that the intelligent terminal works according to the control instruction. The voice characteristics are judged through the working module, but the system has the disadvantages of long response delay time, long time consumption for operating an electronic product through voice, reduced practicability of voice control, inconvenience in use, low precision of voiceprint recognition and easiness in sending an instruction which is not consistent with the intention of a user.
Therefore, it is necessary to solve the above problems of the electronic product, such as long response delay time of the voice control system and low accuracy of the voiceprint recognition, so as to improve the usage scenario of the electronic product.
Disclosure of Invention
(1) Technical problem to be solved
In view of the deficiencies of the prior art, the present invention provides a voice control system for an electronic product capable of suppressing response time delay, which aims to solve the technical problems of the prior art that the response delay time is long, the time consumed for operating the electronic product through voice is long, the practicability of voice control is reduced, the use is inconvenient, the accuracy of voiceprint recognition is low, and an instruction inconsistent with the intention of a user is easy to issue.
(2) Technical scheme
In order to solve the above technical problems, the present invention provides a voice control system for electronic products with suppressed response time delay, the voice control system comprising a sample collection module, a voice collection module, a digital-to-analog conversion module, a storage module, a voice recognition module, a control module and a communication module,
the system comprises a sample acquisition module, a voice input module, a voice recognition module and a voice recognition module, wherein the sample acquisition module comprises a newly-built sample, recorded voice, extracted features and model training, a plurality of users can be established by the sample acquisition module through the newly-built sample, so that different users can send instructions to an electronic product conveniently, the recorded voice is used for acquiring voice data of the users, the recorded voice content comprises awakening words and key words, and the extracted features extract voice features from the acquired voice data according to the particularity of the voice and the stability of the voice;
the voice acquisition module is used for acquiring voice data sent out in the surrounding environment;
the digital-to-analog conversion module is used for converting the acquired analog signals into digital signals convenient to process, reducing or weakening noise influence and improving the accuracy of the acquired voice data, and a conversion algorithm is preset in the digital-to-analog conversion module: converting a continuously variable signal x (t) into a time-discrete sampling signal x (n), wherein the sampling rate fs =2.5fmax, and fmax is the highest frequency component x (t), holding a transient analog signal obtained by sampling output for a period of time, converting the continuous amplitude sampling signal into a discrete time and discrete amplitude digital signal, quantizing an error, and encoding the quantized signal into a binary code for output;
the storage module comprises an instruction library, a model library and a text library, wherein each preset instruction for controlling the electronic product to complete corresponding operation is arranged in the instruction library and consists of an operation code and an address code, the model library contains personal voiceprint templates of all users, and the text library contains preset words or sentences;
instruction comparison rules are preset in the voice recognition module: comparing the obtained input instruction I with instructions In an instruction library k In sequence, wherein the instructions In the k comprise I1, I2 and I3 … … In, firstly, carrying out first comparison, I and I1 are compared, if the matching degree P1 of I and I1 is not less than 70%, I1 is reserved In the result, if not, the result is 0, then carrying out second comparison, I and I2 are compared, if P2 is less than 70%, the first result is kept unchanged, if the matching degree P2 of I and I2 is not less than 70%, combining the first result, if I1 is reserved In the first result, if P1 is not less than P2, I2 is reserved In the result, otherwise, I1 is reserved, if the first comparison is 0, I2 is reserved In the result, until the comparison of I and all the instructions In the instruction library k is finished, the final result Ix is used as the output of the final instruction, and if the final result is 0, the instruction is processed ineffectively;
the control module is used for commanding each module to complete sample collection, voice collection, digital-to-analog conversion, storage, voice recognition and communication work within a specified time according to requirements;
the communication module is used for sending the final instruction to the electronic product, so that the electronic product can make corresponding operation according to the voice of a user, and the control module is preset with response time.
When the voice control system of the technical scheme is used, firstly, a voice acquisition module adopts a voice command of a user, a digital-to-analog conversion module converts a continuously changing signal x (t) into a time-discrete sampling signal x (n) with a sampling rate fs =2.5fmax, wherein fmax is x (t) the highest frequency component, a transient analog signal obtained by sampling and outputting is kept for a period of time, a continuous amplitude sampling signal is converted into a discrete time and discrete amplitude digital signal, a quantization error is obtained, the quantized signal is encoded into a binary code and output, a voice recognition module sequentially compares an obtained input command I with a command in a command library k, firstly, the I is compared with I1 for the first time, if the matching degree P1 of the I and I1 is not less than or equal to 70%, the I1 is kept in the result, if the result is not, the second comparison is carried out, the I is compared with the I2 for the second time, if P2 is less than 70%, keeping the first result unchanged, if the matching degree P2 of I and I2 is more than or equal to 70%, combining the first result, keeping I1 in the first result, if P1 is less than or equal to P2, keeping I2 in the result, otherwise keeping I1, if the first comparison result is 0, keeping I2 in the result until the comparison of I and all instructions in the instruction base k is finished, taking the final result Ix as the output of the final instruction, if the final result is 0, performing invalidation processing on the instruction, and sending the instruction needing to be output to the electronic product through the communication module by the control module, so that the electronic product executes the response operation.
Preferably, the particularity of the speech includes tone quality, duration, intensity and pitch, the model training simulates a speaker according to the sound characteristics, a personal voiceprint template specific to the user is established, and the maximum number of users that can be established by the sample acquisition module is 3, namely user 1, user 2 and user 3.
Preferably, a voice recording ending judgment rule is preset in the voice acquisition module: and when the total acquisition time is more than 15s, the voice acquisition module automatically stops the voice recording operation.
Preferably, the voice recognition module includes a voiceprint recognition processing unit, a text conversion processing unit, a semantic parsing unit, and an instruction comparison unit, the voiceprint recognition processing unit compares the voice data collected by the voice collection module with the personal voiceprint template in the model library, the text conversion processing unit converts the voice data into text information, the semantic parsing unit performs semantic check and processing on the text information to generate a corresponding target instruction, the instruction comparison unit compares the target instruction generated by the semantic parsing unit with an instruction in the instruction library to determine whether the target instruction generated by the semantic parsing unit needs to be output.
Preferably, an identification algorithm is preset in the voiceprint identification processing unit: firstly, judging whether the awakening word is correct or not, if not, the voice data is invalid, if the awakening word is correct, calling a personal voiceprint template of a user 1 in a model base, respectively comparing the personal voiceprint template of the user 1 with the awakening word data collected by a voice collection module from the four aspects of tone quality, duration, intensity and pitch, if the similarity value exceeds 95%, judging that the awakening word data belongs to the user 1, if the similarity value is lower than 95%, continuing calling a personal voiceprint template of a user 2 in the model base for comparison, if the similarity value is higher than 95%, judging that the awakening word data belongs to the user 2, if the similarity value is lower than 95%, continuing calling a personal voiceprint template of a user 3 in the model base for comparison, if the similarity value is higher than 95%, judging that the awakening word data belongs to the user 3, if the similarity value is lower than 95%, judging that the awakening word data is invalid, if the awakening word is valid, calling a personal voiceprint template of the user 1 in the model base, comparing the four aspects of tone quality, duration, intensity and pitch with the instruction voice data collected by the voice collection module, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 1, if the similarity value is lower than 95%, continuously calling the personal voiceprint template of the user 2 in the model base for comparison, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 2, if the similarity value is lower than 95%, continuously calling the personal voiceprint template of the user 3 in the model base for comparison, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 3, and if the similarity value is lower than 95%, judging that the instruction voice data is invalid.
Preferably, the text conversion processing unit cuts the audio into a frame language according to a text library in the storage module, matches the frame language with a phrase in the text library, and then converts the frame language into text data.
Preferably, the semantic parsing unit includes text preprocessing, text feature extraction, and classification model construction, where a dictionary table is preset in the text preprocessing, a sentence is split into multiple parts, each part corresponds to the dictionary table one by one, and if the word is in the dictionary table, the word splitting is successful, otherwise, the word splitting and matching are continued until the word is successful.
Preferably, the preset response time in the control module includes a wakeup word response time and a voice dialog response time, wherein the wakeup word response time is 200ms-500ms, and the voice dialog response time is 650ms-1050 ms.
(3) Advantageous effects
Compared with the prior art, the invention has the beneficial effects that: the voice control system can record voice samples of a plurality of users through the sample acquisition module, thereby facilitating the voice control of the plurality of users on electronic products, processing and judging voice data through the voice recognition module, improving the precision of voiceprint comparison, enabling the matching degree of an output instruction and a target instruction of the user to be higher, effectively inhibiting the problem of prolonging of response time through controlling the response time preset in the module, shortening the time consumed by voice operation of the electronic products, and improving the practicability of voice control.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only one embodiment of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic diagram of an overall framework architecture of an embodiment of a voice control system of the present invention;
FIG. 2 is a schematic diagram of a sample acquisition module frame according to one embodiment of the voice control system of the present invention;
FIG. 3 is a block diagram of a frame structure of a speech recognition module according to an embodiment of the speech control system of the present invention;
FIG. 4 is a block diagram of a memory module according to an embodiment of the present invention.
Detailed Description
In order to make the technical means, the original characteristics, the achieved purposes and the effects of the invention easily understood and obvious, the technical solutions in the embodiments of the present invention are clearly and completely described below to further illustrate the invention, and obviously, the described embodiments are only a part of the embodiments of the present invention, but not all the embodiments.
Example 1
The voice control system of the electronic product for suppressing the response time delay in the embodiment has an overall frame structure as shown in fig. 1, and comprises a sample acquisition module, a voice acquisition module, a digital-to-analog conversion module, a storage module, a voice recognition module, a control module and a communication module,
the sample acquisition module comprises a newly-built sample, input voice, extraction features and model training, a plurality of users can be established by the sample acquisition module through the newly-built sample, so that different users can send instructions to the electronic product conveniently, the input voice is used for acquiring voice data of the users, the voice content input by the input voice comprises awakening words and key words, and the extraction features extract the voice features from the acquired voice data according to the particularity of the voice and the stability of the voice;
the voice acquisition module is used for acquiring voice data sent out in the surrounding environment;
the digital-to-analog conversion module is used for converting the acquired analog signals into digital signals convenient to process, reducing or weakening noise influence and improving the accuracy of the acquired voice data, and a conversion algorithm is preset in the digital-to-analog conversion module: converting a continuously variable signal x (t) into a time-discrete sampling signal x (n), wherein the sampling rate fs =2.5fmax, and fmax is the highest frequency component x (t), holding a transient analog signal obtained by sampling output for a period of time, converting the continuous amplitude sampling signal into a discrete time and discrete amplitude digital signal, quantizing an error, and encoding the quantized signal into a binary code for output;
the storage module comprises an instruction library, a model library and a text library, wherein each preset instruction for controlling the electronic product to complete corresponding operation is arranged in the instruction library and consists of an operation code and an address code, the model library contains personal voiceprint templates of all users, and the text library contains preset words or sentences;
instruction comparison rules are preset in the voice recognition module: comparing the obtained input instruction I with instructions In an instruction library k In sequence, wherein the instructions In the k comprise I1, I2 and I3 … … In, firstly, carrying out first comparison, I and I1 are compared, if the matching degree P1 of I and I1 is not less than 70%, I1 is reserved In the result, if not, the result is 0, then carrying out second comparison, I and I2 are compared, if P2 is less than 70%, the first result is kept unchanged, if the matching degree P2 of I and I2 is not less than 70%, combining the first result, if I1 is reserved In the first result, if P1 is not less than P2, I2 is reserved In the result, otherwise, I1 is reserved, if the first comparison is 0, I2 is reserved In the result, until the comparison of I and all the instructions In the instruction library k is finished, the final result Ix is used as the output of the final instruction, and if the final result is 0, the instruction is processed ineffectively;
the control module is used for commanding each module to complete sample collection, voice collection, digital-to-analog conversion, storage, voice recognition and communication work within a specified time according to requirements;
the communication module is used for sending the final instruction to the electronic product, so that the electronic product can make corresponding operation according to the voice of a user, and the response time is preset in the control module.
Wherein, the specificity of pronunciation includes tone quality, duration, sound intensity and pitch, and the speaker is simulated according to the sound characteristic to the model training, establishes exclusive in user's individual voiceprint model, and the user that the sample collection module can establish is 3 at most, is user 1, user 2 and user 3 respectively, has preset the pronunciation in the voice collection module and has typed the end judgement rule: when the voice information is not acquired within 1s, the voice acquisition module judges that the voice input is finished, when the total acquisition time is more than 15s, the voice acquisition module automatically stops the voice input operation, the voice identification module comprises a voiceprint recognition processing unit, a text conversion processing unit, a semantic analysis unit and an instruction comparison unit, the voiceprint recognition processing unit compares the voice data acquired by the voice acquisition module with a personal voiceprint template in a model base, the text conversion processing unit converts the voice data into text information, the semantic analysis unit performs semantic check and processing on the text information to generate a corresponding target instruction, and the instruction comparison unit compares the target instruction generated by the semantic analysis unit with the instruction in the instruction base to judge whether the target instruction generated by the semantic analysis unit needs to be output.
Meanwhile, an identification algorithm is preset in the voiceprint identification processing unit: firstly, judging whether the awakening word is correct or not, if not, the voice data is invalid, if the awakening word is correct, calling a personal voiceprint template of a user 1 in a model base, respectively comparing the personal voiceprint template of the user 1 with the awakening word data collected by a voice collection module from the four aspects of tone quality, duration, intensity and pitch, if the similarity value exceeds 95%, judging that the awakening word data belongs to the user 1, if the similarity value is lower than 95%, continuing calling a personal voiceprint template of a user 2 in the model base for comparison, if the similarity value is higher than 95%, judging that the awakening word data belongs to the user 2, if the similarity value is lower than 95%, continuing calling a personal voiceprint template of a user 3 in the model base for comparison, if the similarity value is higher than 95%, judging that the awakening word data belongs to the user 3, if the similarity value is lower than 95%, judging that the awakening word data is invalid, if the awakening word is valid, calling a personal voiceprint template of the user 1 in the model base, comparing the four aspects of tone quality, duration, intensity and pitch with the instruction voice data collected by the voice collection module, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 1, if the similarity value is lower than 95%, continuously calling the personal voiceprint template of the user 2 in the model base for comparison, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 2, if the similarity value is lower than 95%, continuously calling the personal voiceprint template of the user 3 in the model base for comparison, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 3, and if the similarity value is lower than 95%, judging that the instruction voice data is invalid.
In addition, the text conversion processing unit cuts the audio into frame languages according to the text library in the storage module, matches the frame languages with the phrases in the text library and then converts the frame languages into text data.
In addition, the semantic analysis unit comprises text preprocessing, text feature extraction and classification model construction, a dictionary table is preset in the text preprocessing, a sentence is divided into a plurality of parts, each part corresponds to the dictionary table one by one, if the word is in the dictionary table, the word division is successful, otherwise, the division and matching are continued until the word division is successful, and the response time preset in the control module comprises wakeup word response time and voice conversation response time, wherein the wakeup word response time is 200ms-500ms, and the voice conversation response time is 650ms-1050 ms.
A schematic diagram of a frame structure of a sample collection module of the speech control system is shown in fig. 2, a schematic diagram of a frame structure of a speech recognition module thereof is shown in fig. 3, and a schematic diagram of a frame structure of a storage module thereof is shown in fig. 4.
When the voice control system of the technical scheme is used, firstly, a voice acquisition module adopts a voice command of a user, a digital-to-analog conversion module converts a continuously changing signal x (t) into a time-discrete sampling signal x (n) with a sampling rate fs =2.5fmax, wherein fmax is x (t) the highest frequency component, a transient analog signal obtained by sampling and outputting is kept for a period of time, a continuous amplitude sampling signal is converted into a discrete time and discrete amplitude digital signal, a quantization error is obtained, the quantized signal is encoded into a binary code and output, a voice recognition module sequentially compares an obtained input command I with a command in a command library k, firstly, the I is compared with I1 for the first time, if the matching degree P1 of the I and I1 is not less than or equal to 70%, the I1 is kept in the result, if the result is not, the second comparison is carried out, the I is compared with the I2 for the second time, if P2 is less than 70%, keeping the first result unchanged, if the matching degree P2 of I and I2 is more than or equal to 70%, combining the first result, keeping I1 in the first result, if P1 is less than or equal to P2, keeping I2 in the result, otherwise keeping I1, if the first comparison result is 0, keeping I2 in the result until the comparison of I and all instructions in the instruction base k is finished, taking the final result Ix as the output of the final instruction, if the final result is 0, performing invalidation processing on the instruction, and sending the instruction needing to be output to the electronic product through the communication module by the control module, so that the electronic product executes the response operation.
Having thus described the principal technical features and basic principles of the invention, and the advantages associated therewith, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, but is capable of other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Furthermore, it should be understood that although the present description is described in terms of various embodiments, not every embodiment includes only a single embodiment, and such descriptions are provided for clarity only, and those skilled in the art will recognize that the embodiments described herein can be combined as a whole to form other embodiments as would be understood by those skilled in the art.