Disclosure of Invention
In view of this, the present invention provides a motor control method, control system and control chip to control the motor to generate corresponding vibrations only when a player shoots a gunshot by himself.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a motor control method comprising:
acquiring stereo audio data of a currently running game;
judging whether the stereo audio data contains specific voice or not;
if the specific voice is included, judging whether the specific voice is a first specific voice or not, wherein the specific voice comprises gunshot, and the first specific voice comprises the gunshot of the player who plays the game currently;
and if the first specific voice is the first specific voice, controlling the motor to vibrate when the first specific voice is played.
Optionally, before determining whether the stereo audio data includes the specific voice, the method further includes:
dividing the stereo audio data into a first group of data, a second group of data and a third group of data, wherein the first group of data is one with the smallest absolute value in left channel data and right channel data in the stereo audio data, the second group of data is left channel data in the stereo audio data, and the third group of data is right channel data in the stereo audio data;
Judging whether the stereo audio data contains specific voice or not comprises the following steps: judging whether the stereo audio data contains specific voice or not according to the first group of data;
determining whether the particular speech is a first particular speech includes: and judging whether the specific voice is the first specific voice or not according to the second group of data and the third group of data.
Optionally, before determining whether the stereo audio data includes a specific voice according to the first set of data, the method further includes:
dividing the first set of data, the second set of data and the third set of data into a plurality of frames of data, wherein each N of the data is a frame, and N is a natural number greater than 1;
carrying out low-pass filtering processing on each frame of data in the first group of data, carrying out band-pass filtering processing on each frame of data in the second group of data and the third group of data, and reserving data of a frequency band where specific voice is located in each frame of data;
and taking absolute values of N data in each frame of data in the first group of data, the second group of data and the third group of data after filtering, summing, and calculating average values of N data in each frame of data in the first group of data, the second group of data and the third group of data.
Optionally, determining whether the stereo audio data includes a specific voice according to the first set of data includes:
judging that the point of the ith frame data in the first group of data is a fast wave peak point, wherein i is more than or equal to 0, and the ith frame data contains the specific voice;
judging that the point of the ith frame data in the first group of data is not a rapid peak point, but the point of the ith-1 frame data in the first group of data is a peak point, wherein i is more than or equal to 1, and the ith-1 frame data contains the specific voice;
if the i-th frame data or the i-1-th frame data contains the specific voice, the stereo audio data contains the specific voice.
Optionally, determining that the point at which the i-th frame data in the first set of data is located is a fast peak point includes:
judging whether the average value of the ith frame data in the first group of data is larger than or equal to a first preset value or not, and whether the average value of the data between the ith frame data and the previous adjacent trough point in the first group of data is smaller than or equal to the first preset value or not is simultaneously established, wherein i is larger than or equal to 0;
if yes, the point where the ith frame data in the first group of data is located is a fast peak point.
Optionally, determining whether the trough point is the trough point includes:
Judging whether the average value of the i-2 th frame data in the first group of data is larger than the average value of the i-1 th frame data and whether the average value of the i-1 th frame data is smaller than or equal to the average value of the i-1 th frame data is simultaneously true or not, wherein i is more than or equal to 2;
if yes, determining the point of the ith-1 frame data in the first group of data as a trough point.
Optionally, determining whether the i-1 st frame data in the first set of data is a peak point includes:
judging whether the average value of the i-2 th frame data in the first group of data is smaller than the average value of the i-1 th frame data, and whether the average value of the i-1 th frame data is larger than or equal to the average value of the i-1 th frame data is simultaneously true or not, wherein i is more than or equal to 2;
if yes, determining the point of the ith-1 frame data in the first group of data as a peak point.
Optionally, determining whether the specific voice is the first specific voice according to the second set of data and the third set of data includes:
judging whether the specific voice parameter of the jth frame data is smaller than a second preset value, if yes, judging that the specific voice in the jth frame data is the first specific voice;
if the average value of the j-th frame data in the second group of data is greater than or equal to the average value of the j-th frame data in the third group of data, the specific voice parameter is equal to the ratio of the average value of the j-th frame data in the second group of data to the average value of the j-th frame data in the third group of data; if the average value of the j-th frame data in the second group of data is smaller than the average value of the j-th frame data in the third group of data, the specific voice parameter is equal to the ratio of the average value of the j-th frame data in the third group of data to the average value of the j-th frame data in the second group of data.
Optionally, after determining that the specific voice is the first specific voice, the method further includes:
and determining the vibration sense of the motor according to the average value of the j-th frame data in the first group of data so as to control the motor to vibrate according to the corresponding vibration sense.
A motor control system, comprising:
the audio acquisition module is used for acquiring stereo audio data of the current running game;
the voice recognition module is used for judging whether the stereo audio data contains specific voice and whether the specific voice is first specific voice or not and outputting a control instruction, wherein the specific voice comprises gunsound, and the first specific voice comprises the gunsound of a player currently running a game;
the motor driving chip is used for receiving the control instruction and controlling the motor to vibrate;
the audio power amplifier module is used for receiving the stereo audio data acquired by the audio acquisition module and controlling a loudspeaker to play the stereo audio data.
Optionally, the voice recognition module is further configured to divide the stereo audio data into a first set of data, a second set of data and a third set of data, where the first set of data is one of the stereo audio data with the smallest absolute value of the left channel data and the right channel data, the second set of data is the left channel data in the stereo audio data, the third set of data is the right channel data in the stereo audio data, and determine whether the stereo audio data includes a specific voice according to the first set of data, and determine whether the specific voice is a first specific voice according to the second set of data and the third set of data.
Optionally, before judging whether the stereo audio data includes a specific voice according to the first set of data, the voice recognition module is further configured to divide the first set of data, the second set of data and the third set of data into multiple frames of data, perform low-pass filtering processing on each frame of data in the first set of data, perform band-pass filtering processing on each frame of data in the second set of data and the third set of data, reserve data in a frequency band where the specific voice is located in each frame of data, sum N data in each frame of data in the first set of data, the second set of data and the third set of data after the filtering processing after absolute values are taken, and calculate an average value of N data in each frame of data in the first set of data, the second set of data and the third set of data;
wherein every N data is a frame, N is a natural number greater than 1.
Optionally, the voice recognition module is configured to determine that i is greater than or equal to 0 when a point where the i-th frame data in the first set of data is located is a fast peak point, and determine that the stereo data includes the specific voice, or determine that i is greater than or equal to 1 when a point where the i-th frame data in the first set of data is located is not a fast peak point but the i-1-th frame data in the first set of data is located is a peak point, and determine that the stereo data includes the specific voice.
Optionally, the voice recognition module is configured to determine that a point where the i-th frame data in the first set of data is located is a fast peak point when an average value of the i-th frame data in the first set of data is greater than or equal to a first preset value and an average value of data between the i-th frame data in the first set of data and a neighboring valley point before the i-th frame data is less than or equal to the first preset value; i is more than or equal to 0.
Optionally, the voice recognition module is configured to determine that a point where the i-1 st frame data in the first set of data is located is a trough point when an average value of the i-2 st frame data in the first set of data is greater than an average value of the i-1 st frame data and the average value of the i-1 st frame data is less than or equal to the average value of the i-1 st frame data; i is more than or equal to 2.
Optionally, the voice recognition module is configured to determine that a point where the i-1 st frame data in the first set of data is located is a peak point when an average value of the i-2 st frame data in the first set of data is smaller than an average value of the i-1 st frame data and the average value of the i-1 st frame data is greater than or equal to the average value of the i-1 st frame data; i is more than or equal to 2.
Optionally, the voice recognition module is configured to determine that a specific voice in the jth frame data is the first specific voice when a specific voice parameter of the jth frame data is smaller than a second preset value;
If the average value of the j-th frame data in the second group of data is greater than or equal to the average value of the j-th frame data in the third group of data, the specific voice parameter is equal to the ratio of the average value of the j-th frame data in the second group of data to the average value of the j-th frame data in the third group of data; if the average value of the j-th frame data in the second group of data is smaller than the average value of the j-th frame data in the third group of data, the specific voice parameter is equal to the ratio of the average value of the j-th frame data in the third group of data to the average value of the j-th frame data in the second group of data.
Optionally, after the specific voice is determined to be the first specific voice, the voice recognition module is further configured to determine a vibration sense of the motor according to an average value of the j-th frame data in the first set of data, so as to control the motor to perform vibration of the corresponding vibration sense through the control command.
A motor control chip comprises a processor and a memory;
the memory is used for storing computer execution instructions;
the processor is configured to perform the motor control method as set forth in any one of the above.
Compared with the prior art, the technical scheme provided by the invention has the following advantages:
According to the motor control method, the control system and the control chip provided by the invention, the audio data of the current running game are obtained, whether the audio data contain specific voices is judged, if yes, whether the specific voices are first specific voices is judged, the specific voices comprise gunshot, the first specific voices comprise the gunshot of a player of the current running game, if yes, the motor is controlled to vibrate when the first specific voices are played, so that the motor is controlled to vibrate correspondingly only when the player shoots the gunshot, false vibration caused by shooting of the gunshot of other people is shielded, man-machine interaction is enhanced, and the interest of the vibration of the gunshot of the shooting game is improved.
Detailed Description
The foregoing is a core idea of the present invention, and in order that the above-mentioned objects, features and advantages of the present invention can be more clearly understood, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention provides a motor control method, as shown in fig. 1, comprising the following steps:
s101: acquiring stereo audio data of a currently running game;
taking a mobile phone game as an example, during the running process of the game, different voices are generated and played by the game along with the operation of a player, for example, when the player shoots, the game generates and plays gunshots. Based on this, in the embodiment of the present invention, the audio data of the current running game, that is, the stereo audio signal, may be obtained from the mobile phone system end, where the audio data may be an audio data stream output after decoding audio files in various formats, and then the audio data stream is processed into a signed integer of 16 bits, where, taking the overwhelming stimulation battlefield game as an example, a stereo audio data stream with a 48k sampling rate and a 16bit sampling depth is obtained from the mobile phone system end. Of course, in other embodiments, the sampling rate and the sampling depth may be set according to practical situations, which will not be described herein.
S102: judging whether the audio data stream contains specific voice or not, if so, entering S103;
s103: judging whether the specific voice is a first specific voice or not, wherein the specific voice comprises gunshot, the first specific voice comprises the gunshot of a player who plays the game currently, and if so, entering S104;
s104: when the first specific voice is played, the motor is controlled to vibrate.
After the stereo audio data of the current running game is obtained, the current running game can be obtained from the stereo audio data to generate the voice to be played, whether the required specific voice such as the gunshot is contained or not can be further recognized from the voice, whether the specific voice is the first specific voice or not is judged, if the specific voice is the gunshot of a player, the motor is controlled to vibrate while the first specific voice, namely the gunshot of the player, is played, and if the specific voice is not the gunshot of the player, but the gunshot of another player is not controlled to vibrate. In the embodiment of the present invention, the specific voice is taken as the gunshot, and the first specific voice is taken as the gunshot of the player, but the present invention is not limited thereto.
When playing the voice, the game generates stereo voices with different parameters of the left and right channels according to the distance between the voice-emitting position and the player and the angle between the voice-emitting position and the player, for example, the voice emitted by the player is not different from the full-band voice volume of the left and right channels of the player, the voice emitted by other players on the left side of the player is larger in volume of the high-frequency part of the left channel of the player, the voice emitted by other players on the right side of the player is smaller in volume of the high-frequency corresponding band voice of the right channel of the player, the voice emitted by the other players on the right side of the player is larger in volume of the high-frequency part of the right channel of the player, and the voice emitted by the left channel is smaller in volume of the high-frequency corresponding band voice of the player.
Based on this, in the embodiment of the present invention, whether the current voice contains the specific voice may be determined according to the stereo data, and whether the specific voice is the first specific voice may be determined according to the left channel data and the right channel data in the stereo data. In order to reduce the data amount for judging whether the current voice contains the specific voice or not, and improve the calculation efficiency, in some embodiments of the present invention, the left channel data and the right channel data in the stereo audio data are reduced, so as to judge whether the current voice contains the specific voice or not through calculating the smaller data.
That is, in some embodiments of the present invention, before determining whether the stereo audio data includes the specific speech, the method further includes: the stereo audio data is divided into a first group of data, a second group of data and a third group of data, wherein the first group of data is one with the smallest absolute value in left channel data and right channel data in the stereo audio data, the second group of data is left channel data in the stereo audio data, and the third group of data is right channel data in the stereo audio data.
And, judging whether the stereo audio data contains specific voice includes: judging whether the audio data contains specific voice or not according to the first group of data; determining whether the particular speech is the first particular speech includes: and judging whether the specific voice is the first specific voice or not according to the second group of data and the third group of data.
The calculation formula for taking the smallest absolute value of the left channel data and the right channel data is as follows:
if abs (audio_l (n)). Ltoreq.abs (audio_r (n)), audio (n) =audio_l (n);
if abs (audio_l (n)) > abs (audio_r (n)), audio (n) =audio_r (n);
where n=0, 1, 2 …, audio (n) represents the first set of data, audio_l (n) represents the left channel data, audio_r (n) represents the right channel data, abs () represents the absolute value.
On the basis, before judging whether the stereo audio data contains specific voice according to the first group of data, the method further comprises the following steps:
dividing the first group of data, the second group of data and the third group of data into multiple frames of data, wherein each N data is a frame, the tail ends of less than N data are subjected to 0 supplementing treatment, and N is a natural number greater than 1; for ease of calculation, N may be 1024; wherein the first set of data, the second set of data, and the third set of data are divided into the same number of frames;
and carrying out low-pass filtering processing on each frame of data in the first group of data, carrying out band-pass filtering processing on each frame of data in the second group of data and the third group of data, reserving the data of the frequency band where the specific voice in each frame of data is positioned, filtering out the data of other frequency bands, and wherein the specific voice comprises the first specific voice. Because the primary frequency point of the gunshot in the game is in the range of 60Hz-200Hz, in some embodiments of the invention, the cut-off frequency of the low-pass filter is 225Hz, and most of the human voice and background sound in the game are filtered. Because stereophonic sound shot by other players in the game is attenuated in the frequency range of 2800Hz-3600Hz, in some embodiments of the invention, the band-pass filtering range is 2800Hz-3600Hz, so as to filter part of the sound of other players, and improve the recognition efficiency of the sound of the player.
And then taking absolute values of N data in each frame of data in the first group of data, the second group of data and the third group of data after filtering, summing the absolute values, and calculating average values of N data in each frame of data in the first group of data, the second group of data and the third group of data.
That is, the SUM (i) is obtained by summing the absolute values of the N data in each frame of data in the first group of data after the low-pass filtering processing, the average value AVE (i) of the N data in each frame of data in the first group of data is calculated, the SUM (i) is obtained by summing the absolute values of the N data in each frame of data in the second group of data after the band-pass filtering processing, the average value AVE (i) of the N data in each frame of data in the second group of data is calculated, the SUM (i) is obtained by summing the absolute values of the N data in each frame of data in the third group of data after the band-pass filtering processing, and the average value AVE (i) of the N data in each frame of data in the third group of data is calculated; where i=0, 1, 2 …, AVE (i) =sum (i)/N, ave_l (i) =sum_l (i)/N, ave_r (i) =sum_r (i)/N.
After obtaining the average value AVE (i) of N data in each frame of data in the first set of data, it may be determined whether the stereo audio data includes a specific voice according to the average value AVE (i), and in some embodiments of the present invention, determining whether the stereo audio data includes a specific voice according to the first set of data includes:
Judging that the point of the ith frame data in the first group of data is a fast wave peak point, wherein i is more than or equal to 0, and the ith frame data contains specific voice;
judging that the point of the ith frame data in the first group of data is not a rapid peak point, wherein the point of the ith-1 frame data in the first group of data is a peak point, i is more than or equal to 1, and the ith-1 frame data contains specific voice;
if the i-th frame data or the i-1 th frame data contains a specific voice, the stereo audio data contains a specific voice.
Namely, judging whether the point of the ith frame data in the first group of data is a rapid peak point, wherein i is more than or equal to 0, and if the point of the ith frame data is the rapid peak point, the ith frame data contains specific voice; if the data is not the rapid peak point, judging whether the i-1 frame data in the first group of data is the peak point, wherein i is more than or equal to 1, and if the data is the peak point, the i-1 frame data contains specific voice; if the i-th frame data or the i-1 th frame data contains a specific voice, the stereo audio data contains a specific voice.
In some embodiments of the present invention, determining that a point at which the i-th frame data in the first set of data is located is a fast peak point includes:
judging whether two conditions that the average value of the ith frame data in the first group of data is larger than or equal to a first preset value and the average value of the data between the ith frame data in the first group of data and the previous adjacent trough point is smaller than or equal to the first preset value are simultaneously met, wherein i is larger than or equal to 0; if yes, the point where the ith frame data in the first group of data is located is a fast peak point.
When the specific voice is gunshot, the first preset value is a trigger threshold value for judging the strongest vibration sense of the gunshot vibration. Optionally, the first preset value is 3500, which is not limited to this, and may be set according to actual situations in different application scenarios.
In some embodiments of the present invention, determining whether the i-1 st frame data in the first set of data is a peak point includes:
judging whether the average value of the i-2 th frame data in the first group of data is smaller than the average value of the i-1 th frame data and whether the average value of the i-1 th frame data is larger than or equal to the average value of the i-1 th frame data is simultaneously met or not, wherein i is more than or equal to 2; if yes, determining the point of the ith-1 frame data in the first group of data as a peak point.
In some embodiments of the present invention, determining whether it is a trough point comprises:
judging whether the average value of the i-2 th frame data in the first group of data is larger than the average value of the i-1 th frame data and whether the average value of the i-1 th frame data is smaller than or equal to the average value of the i-1 th frame data is simultaneously met or not, wherein i is more than or equal to 2; if yes, determining the point of the ith-1 frame data in the first group of data as a trough point.
When the data includes specific voices such as gunsounds, the average value of the data is larger, the point where the data is located is generally a peak point, and the point between adjacent specific voices such as adjacent gunsounds is generally a valley point. However, when judging whether the specific voice exists in the data, if judging whether the point of the ith frame data is a peak point, the mobile phone system end needs to wait for the ith+1st frame data to be output to judge, so that the judging time is long.
Based on this, in some embodiments of the present invention, whether the point where the i-th frame data is located is a fast peak point is determined by determining whether the average value of the i-th frame data is greater than or equal to a first preset value, and whether the peak point is present or not can be determined by determining whether the fast peak point is present or not, so that whether specific voice is present or not can be determined, and thus, no waiting for the mobile phone system end to output the i+1th frame data is required, and whether specific voice is present in the data can be determined rapidly.
It should be noted that, after determining that the average value of the i-th frame data in the first set of data is greater than or equal to the first preset value, it is further required to determine whether the point where the i-th frame data is located is the fast peak point by determining that the average value of the data between the i-th frame data in the first set of data and the previous adjacent valley point is less than or equal to the first preset value. This is because not only the point having the average value equal to or greater than the first preset value exists on the left side of the peak point, but also the point having the average value equal to or greater than the first preset value exists on the right side of the peak point. If the average value of the data between the i-th frame data and the previous adjacent trough point in the first group of data is larger than the first preset value, the rapid crest point at the left side of the crest point is judged, and the data is determined to contain specific voice, so that the judgment is not required to be repeated. However, if the average value of the data between the i-th frame data and the adjacent valley point before the i-th frame data in the first group of data is less than or equal to the first preset value, it is explained that the peak point is not judged to contain the specific voice yet, and therefore, the following judgment flow is needed to be entered for judgment.
It should be noted that, the data between the i-th frame data and the adjacent valley point before the i-th frame data in the first set of data is limited because the audio data may include a plurality of specific voices, such as a plurality of gunshot, and the points between the adjacent specific voices, such as the adjacent gunshot, are generally valley points, so that in order to avoid judging the specific voices which have been judged before, when judging the rapid peak point, the judged data is limited to be the data between the i-th frame data and the adjacent valley point before the i-th frame data in the first set of data.
In some embodiments, determining whether the stereo audio data includes a particular speech according to the first set of data includes:
judging whether the average value AVE (i-2) of the i-2 th frame data in the first group of data is smaller than the average value AVE (i-1) of the i-1 st frame data, and whether the average value AVE (i-1) of the i-1 st frame data is larger than or equal to the average value AVE (i) of the i-1 st frame data is simultaneously met or not, wherein i is more than or equal to 2;
if yes, determining the point of the i-1 frame data in the first group of data as a Peak point, and enabling a Peak mark peak_falg to be 1 and a trough mark valley_flag to be 0;
if not, judging whether the average value AVE (i-2) of the i-2 th frame data in the first group of data is larger than the average value AVE (i-1) of the i-1 th frame data and whether the average value AVE (i-1) of the i-1 th frame data is smaller than or equal to the average value AVE (i) of the i-1 th frame data is simultaneously met or not;
If yes, judging that the point of the i-1 frame data in the first group of data is a wave Valley point, and enabling a fast wave Peak mark FastPeak_flag to be 0, a wave Peak mark Peak_falg to be 0 and a wave trough mark Valley_flag to be 1;
if not, judging whether the average value AVE (i) of the ith frame data in the first group of data is larger than a first preset value and whether two conditions that a fast peak flag FastPeak_flag is 0 are simultaneously met, wherein i is more than or equal to 0;
if yes, judging that the point of the ith frame data in the first group of data is a fast peak point, and enabling a fast peak flag FastPeak_flag to be 1 and a trough flag Valley_flag to be 0;
judging whether the fast peak flag FastPeak_flag is 1, if yes, judging that the ith frame data contains specific voice, and if not, judging that the ith frame data does not contain specific voice.
In some embodiments of the present invention, after determining that the ith frame data does not include a specific voice, the method further includes:
judging whether two conditions of a Peak flag Peak_falg equal to 1 and a fast Peak flag FastPeak_flag equal to 0 are simultaneously met; if so, the i-1 frame data is determined to contain the specific voice, and if not, the i-1 frame data is determined to not contain the specific voice.
Similarly, if the i-th frame data or the i-1 th frame data contains a specific voice, it is determined that the stereo audio data contains a specific voice.
After determining that the i-th frame data contains a specific voice, further comprising: determining whether a specific voice contained in the i-th frame data is a first specific voice; after the i-1 st frame data is determined to contain the specific voice, the method further comprises: it is determined whether or not the specific voice contained in the i-1 st frame data is the first specific voice.
In some embodiments of the present invention, determining whether the particular speech is the first particular speech based on the second set of data and the third set of data includes:
judging whether the specific voice parameter of the jth frame data is smaller than a second preset value, if so, judging that the specific voice in the jth frame data is the first specific voice;
if the average value of the j-th frame data in the second group of data is greater than or equal to the average value of the j-th frame data in the third group of data, the specific voice parameter is equal to the ratio of the average value of the j-th frame data in the second group of data to the average value of the j-th frame data in the third group of data; if the average value of the j-th frame data in the second group of data is smaller than the average value of the j-th frame data in the third group of data, the specific voice parameter is equal to the ratio of the average value of the j-th frame data in the third group of data to the average value of the j-th frame data in the second group of data.
In some embodiments, determining whether the particular speech is the first particular speech based on the second set of data and the third set of data includes:
Judging whether the average value AVE_L (j) of the j-th frame data in the second group of data is larger than or equal to the average value AVE_R (j) of the j-th frame data in the third group of data, wherein j is larger than or equal to 0;
if so, that is, when the average value ave_l (j) of the j-th frame data in the second group of data is greater than or equal to the average value ave_r (j) of the j-th frame data in the third group of data, making the specific speech parameter self_shoot_flag equal to the ratio of the average value ave_l (j) of the j-th frame data in the second group of data to the average value ave_r (j) of the j-th frame data in the third group of data, that is, making self_shoot_flag=ave_l (j)/ave_r (j);
if not, that is, when the average value ave_l (j) of the jth frame data in the second set of data is smaller than the average value ave_r (j) of the jth frame data in the third set of data, making the specific speech parameter self_shoot_flag equal to the ratio of the average value ave_r (j) of the jth frame data in the third set of data to the average value ave_l (j) of the jth frame data in the second set of data, that is, making self_shoot_flag=ave_r (j)/ave_l (j);
judging whether the specific voice parameter self_shoot_flag is smaller than a second preset value, if so, judging that the j frame data contains the first specific voice. When the specific voice is gunshot, the second preset value can be selected to be 1.08.
That is, if it is determined that the point where the i-th frame data is located is the fast peak point, it may be determined that the i-th frame data includes a specific voice, and then it is determined whether the average value ave_l (i) of the i-th frame data in the second set of data is greater than or equal to the average value ave_r (i) of the i-th frame data in the third set of data, where i is greater than or equal to 0; if yes, let self_shoot_flag=ave_l (i)/ave_r (i); if not, let self_shoot_flag=ave_r (i)/ave_l (i); further judging whether the specific voice parameter self_shoot_flag is smaller than a second preset value, if so, judging that the ith frame data contains the first specific voice.
When the point where the i-1 frame data is located is judged to be a peak point and not to be a rapid peak point, the i-1 frame data can be judged to contain specific voice, and then whether the average value AVE_L (i-1) of the i-1 frame data in the second group of data is larger than or equal to the average value AVE_R (i-1) of the i-1 frame data in the third group of data or not is judged, wherein the average value AVE_L (i-1) of the i-1 frame data in the second group of data is more than or equal to 0; if yes, let self_shoot_flag=ave_l (i-1)/ave_r (i-1); if not, let self_shoot_flag=ave_r (i-1)/ave_l (i-1); further judging whether the specific voice parameter self_shoot_flag is smaller than a second preset value, if so, judging that the i-1 frame data contains the first specific voice.
In some embodiments of the present invention, after determining that the jth frame data includes the first specific voice, the method further includes: and determining the vibration sense of the motor according to the average value of the j-th frame data in the first group of data so as to control the motor to vibrate according to the corresponding vibration sense.
Specifically, when the point of the jth frame data is a rapid peak point and the jth frame data is judged to contain the first specific voice, controlling the motor to vibrate in the jth frame in a corresponding vibration sense; when the point of the j-1 frame data is the peak point and is not the rapid peak point and the j-1 frame data is judged to contain the first specific voice, controlling the motor to vibrate in the j frame according to the vibration sense.
Because the voices emitted by different types of guns are different, in some embodiments of the invention, different vibration senses of the motors are controlled according to different gunsounds, so that the game experience effect is improved. After determining that the j-th frame data includes the first specific voice, letting Ma Dazhen sense that the determined data PeakData is equal to an average value AVE (j) of the j-th frame data in the first group of data, that is, let peakdata=ave (j), if 1000 is less than or equal to PeakData < 2100, let motor vibration sense be a first value; if 2100 is less than or equal to PeakData and less than 2500, the motor vibration is made to be a second value; if 2500 is less than or equal to PeakData is less than 3500, the motor vibration is made to be a third value; if 3500 is less than or equal to PeakData, the motor vibration is set to be a fourth value; peakData is equal to the other value, the motor does not vibrate or stops vibrating.
Alternatively, the first value, the second value, the third value, and the fourth value are sequentially increased, and the greater the average AVE (j), the stronger the vibration intensity of the motor. However, the present invention is not limited thereto, and in other embodiments, the vibration intensities of the motors having different average values AVE (j) are the same, but the larger the average value AVE (j), the longer the vibration period of the motor is, so that the vibration intensities of the motors having different average values AVE (j) are different. In the embodiment of the present invention, stereo audio data of a currently running game is continuously acquired, the stereo audio data is continuously grouped, framed, filtered, etc., each frame of data is continuously determined according to a time sequence, and if one frame of data is determined to contain a first specific voice, the motor is controlled to vibrate accordingly. Whether the current data contains the first specific voice or not in the judging result, judging the subsequently output stereo audio data until the currently running game is stopped or closed.
The embodiment of the invention also provides a motor control system, as shown in fig. 2, comprising:
an audio acquisition module 20, configured to acquire stereo audio data of a currently running game;
a voice recognition module 21, configured to determine whether the stereo audio data includes a specific voice and whether the specific voice is a first specific voice, and output a control instruction, where the specific voice includes a gunshot, and the first specific voice includes a gunshot of a player who is currently playing the game;
a motor driving chip 23 for receiving the control command outputted from the voice recognition module 21 and controlling the motor 25 to vibrate;
the audio power amplifier module 22 is configured to receive the stereo audio data acquired by the audio acquisition module 20, and control the speaker 24 to play the stereo audio data.
It should be noted that, after the audio obtaining module 20 obtains the stereo audio data of the current running game, the stereo audio data is transmitted to the audio power amplifier module 22, and the audio power amplifier module 22 controls the speaker 24 to play the stereo audio data. The audio acquisition module 20 transmits the stereo audio data to the audio power amplifier module 22 and simultaneously transmits the stereo audio data to the voice recognition module 21, the voice recognition module 21 determines whether the stereo audio data contains specific voices and determines whether the specific voices are first specific voices, if the stereo audio data contains specific voices and the specific voices are first specific voices, a control instruction is output to the motor driving chip 23, and the motor driving chip 23 controls the motor 25 to vibrate after receiving the control instruction. Since the recognition speed of the voice recognition module 21 is fast, the motor 25 can be controlled to vibrate while the speaker 24 plays the stereo audio data.
Taking a mobile phone game as an example, during the running process of the game, different voices are generated and played by the game along with the operation of a player, for example, when the player shoots, the game generates and plays gunshots. Based on this, the audio acquisition module 20 acquires audio data of the currently running game, that is, a stereo audio signal, from the mobile phone system side, where the audio data may be an audio data stream outputted after decoding audio files in various formats.
In some embodiments of the present invention, the voice recognition module 21 is further configured to divide the stereo audio data into a first set of data, a second set of data and a third set of data, where the first set of data is one of the stereo audio data having the smallest absolute value of the left channel data and the right channel data, the second set of data is the left channel data in the stereo audio data, the third set of data is the right channel data in the stereo audio data, and determine whether the stereo audio data includes a specific voice according to the first set of data, and determine whether the specific voice is the first specific voice according to the second set of data and the third set of data.
On this basis, before judging whether the stereo audio data contains specific voice according to the first set of data, the voice recognition module 21 is further configured to divide the first set of data, the second set of data and the third set of data into multi-frame data, perform low-pass filtering processing on each frame of data in the first set of data, perform band-pass filtering processing on each frame of data in the second set of data and the third set of data, reserve data in a frequency band where the specific voice is located in each frame of data, sum up after taking absolute values of N data in each frame of data in the first set of data, the second set of data and the third set of data after the filtering processing, and calculate an average value of N data in each frame of data in the first set of data, the second set of data and the third set of data;
Wherein every N data is a frame, N is a natural number greater than 1.
That is, the voice recognition module 21 is further configured to divide the first set of data, the second set of data, and the third set of data into multiple frames of data, where each N data is a frame, the end of less than N data is subjected to 0-compensating processing, and N is a natural number greater than 1; carrying out low-pass filtering processing on each frame of data in the first group of data, carrying out band-pass filtering processing on each frame of data in the second group of data and the third group of data, reserving the data of the frequency band where the specific voice in each frame of data is positioned, filtering out the data of other frequency bands, wherein the specific voice comprises the first specific voice; the method comprises the steps of taking absolute values of N data in each frame of data in a first group of data after low-pass filtering processing, summing the absolute values to obtain SUM (i), calculating an average value AVE (i) of the N data in each frame of data in the first group of data, taking the absolute values of the N data in each frame of data in a second group of data after band-pass filtering processing, summing the absolute values to obtain SUM_L (i), calculating an average value AVE_L (i) of the N data in each frame of data in the second group of data, taking the absolute values of the N data in each frame of data in a third group of data after band-pass filtering processing, summing the absolute values to obtain SUM_R (i), and calculating an average value AVE_R (i) of the N data in each frame of data in the third group of data. Where i=0, 1, 2 …, AVE (i) =sum (i)/N, ave_l (i) =sum_l (i)/N, ave_r (i) =sum_r (i)/N.
In some embodiments of the present invention, the voice recognition module 21 is configured to determine that i is greater than or equal to 0 when the point of the i frame data in the first set of data is a fast peak point, and determine that the stereo audio data includes a specific voice, or determine that i is greater than or equal to 1 when the point of the i frame data in the first set of data is a non-fast peak point but the point of the i-1 frame data in the first set of data is a peak point, and determine that the stereo audio data includes a specific voice.
In some embodiments of the present invention, the voice recognition module 21 is configured to determine that a point where the i-th frame data in the first set of data is located is a fast peak point when an average value of the i-th frame data in the first set of data is greater than or equal to a first preset value and an average value of data between the i-th frame data in the first set of data and a neighboring valley point before the i-th frame data in the first set of data is less than or equal to the first preset value; i is more than or equal to 0.
In some embodiments of the present invention, the speech recognition module 21 is configured to determine that a point where the i-1 st frame data in the first set of data is located is a valley point when an average value of the i-2 st frame data in the first set of data is greater than an average value of the i-1 st frame data and the average value of the i-1 st frame data is less than or equal to the average value of the i-1 st frame data; i is more than or equal to 2.
In some embodiments of the present invention, the speech recognition module 21 is configured to determine that a point where the i-1 st frame data in the first set of data is located is a peak point when an average value of the i-2 th frame data in the first set of data is smaller than an average value of the i-1 st frame data, and the average value of the i-1 st frame data is greater than or equal to the average value of the i-1 st frame data; i is more than or equal to 2.
In some embodiments, the speech recognition module 21 is configured to determine whether two conditions that an average value AVE (i-2) of the i-2 th frame data in the first set of data is smaller than an average value AVE (i-1) of the i-1 th frame data, and that the average value AVE (i-1) of the i-1 th frame data is greater than or equal to the average value AVE (i) of the i-th frame data are simultaneously satisfied, i is greater than or equal to 2;
if yes, determining the point of the i-1 frame data in the first group of data as a Peak point, and enabling a Peak mark peak_falg to be 1 and a trough mark valley_flag to be 0;
if not, judging whether the average value AVE (i-2) of the i-2 th frame data in the first group of data is larger than the average value AVE (i-1) of the i-1 th frame data and whether the average value AVE (i-1) of the i-1 th frame data is smaller than or equal to the average value AVE (i) of the i-1 th frame data is simultaneously met or not;
if yes, judging that the point of the i-1 frame data in the first group of data is a wave Valley point, and enabling a fast wave Peak mark FastPeak_flag to be 0, a wave Peak mark Peak_falg to be 0 and a wave trough mark Valley_flag to be 1;
if not, judging whether the average value AVE (i) of the ith frame data in the first group of data is larger than a first preset value and whether two conditions that a fast peak flag FastPeak_flag is 0 are simultaneously met, wherein i is more than or equal to 0;
If yes, judging that the point of the ith frame data in the first group of data is a fast peak point, and enabling a fast peak flag FastPeak_flag to be 1 and a trough flag Valley_flag to be 0;
judging whether the fast peak flag FastPeak_flag is 1, if yes, judging that the ith frame data contains specific voice, and if not, judging that the ith frame data does not contain specific voice.
On the basis, after determining that the i-th frame data does not contain a specific voice, the voice recognition module 21 is further configured to determine whether two conditions of the Peak flag peak_falg being equal to 1 and the fast Peak flag fastpeak_flag being equal to 0 are simultaneously satisfied; if so, the i-1 frame data is determined to contain the specific voice, and if not, the i-1 frame data is determined to not contain the specific voice. If it is determined that the i-th frame data contains a specific voice or that the i-1 st frame data contains a specific voice, the voice recognition module 21 determines that the stereo data contains a specific voice.
Based on this, in some embodiments of the present invention, the voice recognition module 21 is further configured to determine that the specific voice in the j-th frame data is the first specific voice when the specific voice parameter of the j-th frame data is smaller than the second preset value;
if the average value of the j-th frame data in the second group of data is greater than or equal to the average value of the j-th frame data in the third group of data, the specific voice parameter is equal to the ratio of the average value of the j-th frame data in the second group of data to the average value of the j-th frame data in the third group of data; if the average value of the j-th frame data in the second group of data is smaller than the average value of the j-th frame data in the third group of data, the specific voice parameter is equal to the ratio of the average value of the j-th frame data in the third group of data to the average value of the j-th frame data in the second group of data.
In some embodiments, the voice recognition module 21 is configured to determine whether the average value ave_l (j) of the j-th frame data in the second set of data is greater than or equal to the average value ave_r (j) of the j-th frame data in the third set of data, where j is greater than or equal to 0; if yes, making the specific speech parameter self_shoot_flag equal to the ratio of the average value ave_l (j) of the j-th frame data in the second set of data to the average value ave_r (j) of the j-th frame data in the third set of data, namely making self_shoot_flag=ave_l (j)/ave_r (j); if not, making the specific voice parameter self_shoot_flag equal to the ratio of the average value AVE_R (j) of the j-th frame data in the third group of data to the average value AVE_L (j) of the j-th frame data in the second group of data, namely making self_shoot_flag=AVE_R (j)/AVE_L (j); judging whether the specific voice parameter self_shoot_flag is smaller than a second preset value, if so, judging that the j frame data contains the first specific voice. When the specific voice is gunshot, the second preset value can be selected to be 1.08.
That is, after determining that the i-th frame data contains a specific voice, the voice recognition module 21 is configured to determine whether the average value ave_l (i) of the i-th frame data in the second set of data is greater than or equal to the average value ave_r (i) of the i-th frame data in the third set of data, i being greater than or equal to 0;
If yes, let self_shoot_flag=ave_l (i)/ave_r (i);
if not, let self_shoot_flag=ave_r (i)/ave_l (i);
judging whether the specific voice parameter self_shoot_flag is smaller than a second preset value, if so, judging that the ith frame data contains the first specific voice.
After determining that the i-1 st frame data contains a specific voice, the voice recognition module 21 is configured to determine whether an average value ave_l (i-1) of the i-1 st frame data in the second set of data is greater than or equal to an average value ave_r (i-1), (i-1) being greater than or equal to 0 of the i-1 st frame data in the third set of data;
if yes, let self_shoot_flag=ave_l (i-1)/ave_r (i-1);
if not, let self_shoot_flag=ave_r (i-1)/ave_l (i-1);
judging whether the specific voice parameter self_shoot_flag is smaller than a second preset value, if so, judging that the i-1 frame data contains the first specific voice.
In some embodiments of the present invention, after determining that the specific voice is the first specific voice, the voice recognition module 21 is further configured to determine a vibration sense of the motor according to an average value of the j-th frame data in the first set of data, so as to control the motor to perform vibration of the corresponding vibration sense through the control command.
In some embodiments, after the speech recognition module 21 determines that the j-th frame data includes the first specific speech, let Ma Dazhen sense determination data PeakData equal to an average value AVE (j) of the j-th frame data in the first group of data, that is, let peakdata=ave (j), then determine whether PeakData is greater than or equal to 1000 and less than 2100, if yes, let motor vibration sense be a first value, if no, determine whether PeakData is greater than or equal to 2100 and less than 2500, if yes, let Ma Dazhen sense be a second value, if no, determine whether PeakData is greater than or equal to 2500 and less than 3500, if yes, let motor vibration sense be a third value, if no, determine whether PeakData is greater than or equal to 3500, if yes, let motor vibration sense be a fourth value, if no, and the motor does not vibrate or stop vibrating.
The embodiment of the invention also provides a motor control chip, which comprises a processor and a memory;
the memory is used for storing computer execution instructions;
when the processor executes the computer-executable instructions, the processor performs the motor control method as provided in any of the embodiments above. That is, the processor is configured to execute the motor control method provided in any of the above embodiments.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.