Disclosure of Invention
In view of the above, the present invention provides a jukebox audio processing method, apparatus, computer device and storage medium that overcome or at least partially solve the above-described problems.
In a first aspect, the present invention provides a method for audio processing of a jukebox, including:
collecting audio information of singers according to preset frequency;
carrying out noise reduction processing on the audio information to obtain first target audio information;
acquiring accompaniment audio of singing music of the singer and corresponding voice domain conditions of the singing music;
and carrying out audio effect processing on the first target audio information based on the accompaniment audio, the gamut situation and the first target audio information to obtain second target audio information, and playing the second target audio information.
Further, the collecting the audio information of the singer according to the preset frequency includes:
and collecting the audio information of the singer according to the frequency of 44.1 kHz-48 kHz.
Further, the noise reduction processing is performed on the audio information to obtain first target audio information, including:
the audio information is subjected to primary noise reduction processing through a cyclic neural network, so that audio information after the primary noise reduction processing is obtained;
and the audio information subjected to the primary noise reduction is subjected to secondary noise reduction of a real-time floating point noise reduction model, so that first target audio information is obtained.
Further, the audio effect processing is performed on the first target audio information based on the accompaniment audio, the gamut situation and the first target audio information to obtain second target audio information, and playing is performed, including:
performing audio effect processing on the reverberation effect of the first target audio information based on the first target audio information to obtain reverberation audio information;
performing deviation correction processing on the reverberation audio information and the accompaniment audio based on the reverberation audio information and the accompaniment audio to obtain matching audio information;
and carrying out sound trimming processing on the voice domain of the matched audio information based on the matched audio information and the voice domain condition to obtain second target audio information.
Further, the performing audio effect processing on the reverberation effect of the first target audio information based on the first target audio information to obtain reverberation audio information includes:
copying the first target audio information to obtain two pieces of first target audio information;
and staggering and playing time domains of the two first target audio information to obtain reverberation audio information.
Further, the performing deviation rectifying processing on the reverberation audio information and the accompaniment audio based on the reverberation audio information and the accompaniment audio to obtain matching audio information includes:
judging whether the reverberation audio information leads or lags the rhythm of the accompaniment audio based on the reverberation audio information and the accompaniment audio;
if yes, carrying out deviation rectifying processing on the reverberation audio information and the accompaniment audio to obtain matching audio information.
Further, the performing a sound trimming process on the voice domain of the matching audio information based on the matching audio information and the voice domain situation to obtain second target audio information includes:
judging whether the voice range of any voice frequency in the matched voice frequency information meets the preset voice range condition or not;
if not, when the range of the first audio exceeds the highest threshold value of the corresponding preset range, the first audio is subjected to sound suppressing and trimming processing, and when the range of the second audio does not reach the lowest threshold value of the corresponding preset range, the second audio is subjected to sound compensating and trimming processing, so that second target audio information is obtained, wherein the first audio and the second audio are any audio in the matched audio information.
In a second aspect, the present invention also provides an audio processing apparatus of a jukebox, including:
the sampling module is used for collecting the audio information of singers according to a preset frequency;
the obtaining module is used for carrying out noise reduction processing on the audio information to obtain first target audio information;
the acquisition module is used for acquiring accompaniment audio of the singing song by the singer and the voice domain situation corresponding to the singing song;
and the processing module is used for processing the audio effect of the first target audio information based on the accompaniment audio, the gamut situation and the first target audio information to obtain second target audio information and playing the second target audio information.
In a third aspect, the present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method steps described in the first aspect when the program is executed.
In a fourth aspect, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method steps described in the first aspect.
One or more technical solutions in the embodiments of the present invention at least have the following technical effects or advantages:
the invention provides a jukebox audio processing method, which comprises the following steps: collecting audio information of singers according to preset frequency; noise reduction processing is carried out on the audio information to obtain first target audio information; acquiring accompaniment audio of singing music of singer and corresponding voice domain condition of singing music; based on the accompaniment audio, the gamut situation and the first target audio information, performing audio effect processing on the first target audio information to obtain second target audio information, playing the second target audio information, and performing audio effect optimization on the audio information from acquisition to processing, thereby improving the playing effect of the jukebox.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example 1
An embodiment of the present invention provides a method for audio processing of a jukebox, as shown in fig. 1, including:
s101, collecting audio information of singers according to preset frequency;
s102, carrying out noise reduction processing on the audio information to obtain first target audio information;
s103, acquiring accompaniment audio of songs sung by singers and corresponding voice domain conditions of the singed songs;
s104, based on the accompaniment audio, the gamut situation and the first target audio information, performing audio effect processing on the first target audio information to obtain second target audio information, and playing the second target audio information.
In a specific embodiment, the audio information of the singer is collected through a microphone, and the sampling frequency is strictly required, and because the auditory range of the human ear is 20 Hz-20 kHz, according to the shannon sampling theorem (also called Nyquist sampling theorem), the audio format with the sampling rate being larger than 40kHz can be theoretically called lossless audio, so that the audio information of the singer is collected according to the frequency of 44.1 kHz-48 kHz in the invention. Sampling at such frequencies ensures that the original analog audio data passing through the microphone yields digital signal data with very low basic distortion.
Next, S102 is performed to perform noise reduction processing on the audio information, resulting in first target audio information. Specifically, noise reduction is required for outputting human voice with better effect due to the interference of environmental noise and low current noise in the process of converting the original analog audio data into the digital signal data.
The noise reduction processing in the invention specifically comprises the steps of carrying out primary noise reduction processing on audio information through a cyclic neural network to obtain audio information after the primary noise reduction processing; and the audio information subjected to the primary noise reduction is subjected to secondary noise reduction of a real-time floating point noise reduction model, so that first target audio information is obtained.
The structure of the Recurrent Neural Network (RNN) is specifically shown in fig. 2, where the recurrent neural network includes an input layer (X), a hidden layer (S), and an output layer (O) further includes weights between the levels, that is, a weight 1 (U) between the input layer (X) and the hidden layer, a weight 2 (V) between the hidden layer (S) and the output layer (O), and a weight 3 (W) between the input layer (X) and the output layer (O). The recurrent neural network is such that the current output of the sequence is correlated with the previous output, similar to recursion, in that the network memorizes the previous information and applies it to the calculation of the current output, i.e. the nodes between hidden layers are no longer connectionless but connected, the input of the hidden layers not only comprises the output of the input layer, but also comprises the output of the hidden layer at the previous moment, so iterating until the superfluous noise is filtered out. And then, the audio information subjected to the primary noise reduction is subjected to secondary noise reduction of a real-time floating point noise reduction model, so that first target audio information is obtained. In the real-time floating point noise reduction model, a shaping value of audio information after primary noise reduction is converted into a floating point value, filtering is performed again by utilizing high precision of the floating point number, and a part of the remaining fine noise part is filtered, so that first target audio information is obtained.
In S102, after the audio information is processed by the model, the noise spectrum is subtracted, so that the noise is reduced, the layering of the voice is enhanced, and the tone quality and the listening feel are improved.
Next, S103 is executed to acquire accompaniment audio of a song singed by a singer and a gamut situation corresponding to the singed song, after that, S104 is executed to perform audio effect processing on the first target audio information based on the accompaniment audio, the gamut situation and the first target audio information to obtain second target audio information, and play the second target audio information.
First, accompaniment audio and the corresponding gamut situation of a normal song can be directly obtained from the song list. After determining these pieces of information, audio effect processing is performed on the first target audio information.
Specifically, the audio effect processing includes: the three processing modes can be processed according to the following sequence, namely, the reverberation effect processing is firstly performed, the correction processing is then performed, and the sound correction processing is finally performed, however, the sequence of the three processing modes can also be exchanged, and detailed description is omitted.
The processing is carried out in the manner that: performing audio effect processing on the reverberation effect of the first target audio information based on the first target audio information to obtain reverberation audio information; performing deviation correction processing on the reverberation audio information and the accompaniment audio based on the reverberation audio information and the accompaniment audio to obtain matching audio information; and carrying out sound trimming processing on the voice domain of the matched audio information based on the situation of the matched audio information and the voice domain to obtain second target audio information.
The reverberation effect processing method comprises the steps of copying first target audio information to obtain two pieces of first target audio information; and staggering and playing time domains of the two first target audio information to obtain reverberation audio information.
Specifically, as shown in fig. 3, one first target audio information 301 and another first target audio information 302 are played according to different time periods, for example, one first target audio information 301 is played at a first time period, another first target audio information 302 is played at a second time period, the second time period is later than the first time period, and a time interval T between the second time period and the first time period is 1 s-3 s. And playing the two pieces of first target audio information to obtain the reverberation audio information.
Deviation rectifying processing, specifically, judging whether the reverberation audio information leads or lags the rhythm of the accompaniment audio based on the reverberation audio information and the accompaniment audio; if yes, carrying out deviation rectifying processing on the reverberation audio information and the accompaniment audio to obtain matching audio information.
The correction processing specifically includes correcting the reverberant audio information according to the rhythm of the accompaniment audio, for example, performing hysteresis processing on the reverberant audio information when the reverberant audio information advances the rhythm of the accompaniment audio, and performing advanced processing on the reverberant audio information when the reverberant audio information retards the rhythm of the accompaniment audio. So that the reverberant audio information can be matched with the accompaniment audio, thereby obtaining matched audio information.
The sound repairing process specifically comprises the following steps: judging whether the range of any audio in the matched audio information meets the preset range condition or not; if not, when the range of the first audio exceeds the highest threshold value of the corresponding preset range, the first audio is subjected to sound suppressing and trimming processing, and when the range of the second audio does not reach the lowest threshold value of the corresponding preset range, the second audio is subjected to sound compensating and trimming processing, so that second target audio information is obtained, wherein the first audio and the second audio are any audio in the matched audio information.
By performing the trimming processing on any audio which does not meet the highest threshold and the lowest threshold of the preset voice domain, the voice of the singer can meet the preset voice domain, and the voice effect is better.
The above-mentioned sound effect processing process can also be changed in order, and detailed description thereof will not be repeated here. Therefore, second target audio information with better sound effect is obtained, and finally, playing is carried out.
One or more technical solutions in the embodiments of the present invention at least have the following technical effects or advantages:
the invention provides a jukebox audio processing method, which comprises the following steps: collecting audio information of singers according to preset frequency; noise reduction processing is carried out on the audio information to obtain first target audio information; acquiring accompaniment audio of singing music of singer and corresponding voice domain condition of singing music; based on the accompaniment audio, the gamut situation and the first target audio information, performing audio effect processing on the first target audio information to obtain second target audio information, playing the second target audio information, and performing audio effect optimization on the audio information from acquisition to processing, thereby improving the playing effect of the jukebox.
Example two
Based on the same inventive concept, the embodiment of the present invention further provides an audio processing device of a jukebox, as shown in fig. 4, including:
the sampling module 401 is configured to collect audio information of a singer according to a preset frequency;
an obtaining module 402, configured to perform noise reduction processing on the audio information to obtain first target audio information;
an obtaining module 403, configured to obtain accompaniment audio of a song by the singer and a gamut situation corresponding to the singed song;
and the processing module 404 is configured to perform audio effect processing on the first target audio information based on the accompaniment audio, the gamut situation and the first target audio information, obtain second target audio information, and play the second target audio information.
In an alternative embodiment, the sampling module 401 is configured to collect audio information of the singer at a frequency of 44.1khz to 48 khz.
In an alternative embodiment, the obtaining module 402 is configured to perform primary noise reduction processing on the audio information through a recurrent neural network to obtain audio information after the primary noise reduction processing; and the audio information subjected to the primary noise reduction is subjected to secondary noise reduction of a real-time floating point noise reduction model, so that first target audio information is obtained.
In an alternative embodiment, the processing module 404 includes:
the first obtaining unit is used for carrying out audio effect processing on the reverberation effect of the first target audio information based on the first target audio information to obtain reverberation audio information;
the second obtaining unit is used for carrying out deviation rectifying processing on the reverberation audio information and the accompaniment audio based on the reverberation audio information and the accompaniment audio to obtain matching audio information;
and the third obtaining unit is used for carrying out sound trimming processing on the voice domain of the matched audio information based on the matched audio information and the voice domain condition to obtain second target audio information.
In an alternative embodiment, the first obtaining unit is configured to copy the first target audio information to obtain two first target audio information; and staggering and playing time domains of the two first target audio information to obtain reverberation audio information.
In an optional embodiment, the second obtaining unit is configured to determine, based on the reverberation audio information and the accompaniment audio, whether the reverberation audio information leads or lags a rhythm of the accompaniment audio; if yes, carrying out deviation rectifying processing on the reverberation audio information and the accompaniment audio to obtain matching audio information.
In an optional implementation manner, the third obtaining unit is configured to determine whether a range of any audio in the matching audio information meets a preset range condition; if not, when the range of the first audio exceeds the highest threshold value of the corresponding preset range, the first audio is subjected to sound suppressing and trimming processing, and when the range of the second audio does not reach the lowest threshold value of the corresponding preset range, the second audio is subjected to sound compensating and trimming processing, so that second target audio information is obtained, wherein the first audio and the second audio are any audio in the matched audio information.
Example III
Based on the same inventive concept, an embodiment of the present invention provides a computer device, as shown in fig. 5, including a memory 504, a processor 502, and a computer program stored in the memory 504 and executable on the processor 502, where the processor 502 implements the steps of the aforementioned jukebox audio processing method when executing the program.
Where in FIG. 5 a bus architecture (represented by bus 500), bus 500 may include any number of interconnected buses and bridges, with bus 500 linking together various circuits, including one or more processors, represented by processor 502, and memory, represented by memory 504. Bus 500 may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., as are well known in the art and, therefore, will not be described further herein. Bus interface 506 provides an interface between bus 500 and receiver 501 and transmitter 503. The receiver 501 and the transmitter 503 may be the same element, i.e. a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 502 is responsible for managing the bus 500 and general processing, while the memory 504 may be used to store data used by the processor 502 in performing operations.
Example IV
Based on the same inventive concept, an embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the jukebox audio processing method described above.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each embodiment. Rather, as each embodiment reflects, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in a specific implementation, any of the claimed embodiments may be used in any combination.
Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components of a jukebox audio processing device, computer device according to embodiments of the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.