CN119497017A

Movatterモバイル変換

Info

Publication number: CN119497017A
Application number: CN202311023056.4A
Authority: CN
Inventors: 吴东哲; 张丽娜
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2023-08-14
Filing date: 2023-08-14
Publication date: 2025-02-21

Abstract

The method comprises the steps of acquiring a first audio signal through a first acquisition unit of the terminal equipment and acquiring a second audio signal through a second acquisition unit of the terminal equipment, enabling the distance difference between the first acquisition unit and the second acquisition unit and a first sound source to be within a first range, enabling the distance difference between the first acquisition unit and the second acquisition unit and a second sound source to be larger than the maximum value of the first range, enabling the first sound source to be a target acquisition object, superposing the first audio signal and the second audio signal in a time domain to obtain a superposed audio signal, and filtering noise audio signals with amplitude smaller than a preset value in the superposed audio signal to obtain a target audio signal of the target acquisition object. The interference of the sound emitted by the sound source to the required sound is reduced, the required audio signal is obtained, the noise is reduced, and the quality of the audio signal is improved.

Description

Audio signal processing method, device, terminal equipment and storage medium

Technical Field

The disclosure relates to the technical field of audio signal processing, and in particular relates to an audio signal processing method, an audio signal processing device, terminal equipment and a storage medium.

Background

With the development of audio technology, more and more terminal devices have an audio signal processing function, and a microphone is generally provided in the terminal device having the audio signal processing function, so that an audio signal of an environment where the terminal device is located can be collected through the microphone.

In some specific application scenarios, when there are multiple sound sources in the environment where the terminal device is located, the microphone may collect various sounds in the current environment, for example, multiple people in the current scenario are speaking, and the microphone may collect the sound of each person. Some sounds may be needed according to business needs and some sounds may not be needed for the business scenario.

Disclosure of Invention

The disclosure provides an audio signal processing method, an audio signal processing device, terminal equipment and a storage medium.

According to a first aspect of the disclosed embodiments, an audio signal processing method is provided and applied to a terminal device, and the method includes the steps of acquiring a first audio signal through a first acquisition unit of the terminal device and acquiring a second audio signal through a second acquisition unit of the terminal device, wherein the distance difference between the first acquisition unit and the second acquisition unit and a first sound source is within a first range, the distance difference between the first acquisition unit and the second acquisition unit and a second sound source is larger than the maximum value of the first range, the first sound source is a target acquisition object, overlapping the first audio signal and the second audio signal in a time domain to obtain an overlapped audio signal, and filtering noise audio signals with amplitude smaller than a preset value in the overlapped audio signal to obtain a target audio signal of the target acquisition object.

In one embodiment, the terminal device comprises a rectangular frame and a rear shell, wherein the rectangular frame comprises a group of long sides and a group of short sides, the long sides and the short sides are connected end to end, when the terminal device is in a horizontal screen posture, the collecting surface of the first collecting unit and the collecting surface of the second collecting unit are located at different positions on the same long side, when the terminal device is in a vertical screen posture, the collecting surface of the first collecting unit and the collecting surface of the second collecting unit are located at different sides of the rectangular frame, and when the terminal device is in an inclined state or a vertical posture, the collecting surface of the first collecting unit is located on the rectangular frame, and the collecting surface of the second collecting unit is located on the rear shell.

In one embodiment, the method further comprises the steps of determining an acquisition scene, setting audio processing parameters of the terminal equipment according to the current acquisition scene, wherein the audio processing parameters are at least used for determining an acquisition unit and/or acquisition parameters for audio acquisition under the acquisition scene;

The acquisition scenes comprise first scenes, wherein the first scenes are scenes for carrying out audio acquisition on a single sound source, and the audio processing parameters are at least used for determining the first acquisition unit and the second acquisition unit to acquire audio and filter the audio under the first scenes.

In one embodiment, the first scene corresponding audio processing parameters comprise the preset value, first direction information used for indicating the sound collection directions of the first collection unit and the second collection unit, and first noise reduction parameters used for reducing noise of the environment noise in the first scene.

In one embodiment, the acquisition scene further comprises a second scene, wherein in the second scene, audio signal acquisition is performed on a plurality of sound sources through at least one audio acquisition unit in the terminal device.

In one embodiment, the audio processing parameters corresponding to the second scene comprise at least one of acquisition unit information for determining an acquisition unit for acquiring a plurality of sound sources, second direction information for acquiring directions of sound acquisition of the plurality of sound sources, distance information for determining an acquisition distance of the sound sources, and second noise reduction parameters for reducing noise of environmental sounds of the second scene.

In one embodiment, the acquisition scene further comprises a third scene, wherein the third scene is used for acquiring audio based on an audio acquisition peripheral independent of the terminal device.

In one embodiment, the audio processing parameters corresponding to the second scene comprise at least one of first indication information for indicating to use an audio acquisition peripheral connected with the terminal equipment to acquire audio signals, third noise reduction parameters for indicating to close an environment denoising function when the acquisition unit on the terminal equipment is used for audio acquisition, and second indication information for indicating to the acquisition unit of the terminal equipment to acquire environment sounds.

In one embodiment, the determining the acquisition scene comprises determining the acquisition scene according to an input operation acting on the terminal device, or determining the acquisition scene according to whether an audio acquisition peripheral connected with the terminal device exists.

A third aspect of the embodiments of the present disclosure provides an audio signal processing apparatus, including:

The system comprises a first acquisition unit, an acquisition module, a superposition module and a filtering module, wherein the first acquisition unit is used for acquiring a first audio signal and acquiring a second audio signal through the second acquisition unit, the distance difference between the first acquisition unit and the second acquisition unit and a first sound source is in a first range, the distance difference between the first acquisition unit and the second acquisition unit and a second sound source is larger than the maximum value of the first range, the first sound source is a target acquisition object, the superposition module is used for superposing the first audio signal and the second audio signal in a time domain to obtain a superposition audio signal, and the filtering module is used for filtering noise audio signals with amplitude smaller than a preset value in the superposition audio signal to obtain a target audio signal of the target acquisition object.

A third aspect of the embodiments of the present disclosure provides a terminal device, including:

a processor and a memory for storing executable instructions capable of executing on the processor, wherein the processor is configured to execute the method of any of the embodiments described above when the executable instructions are executed.

In a fourth aspect of the disclosed embodiments, there is provided a non-transitory computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the method of any of the above embodiments.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:

According to the audio signal processing method, different audio signals are collected through different collecting units, for example, a first collecting unit collects first audio signals, and a second collecting unit collects second audio signals. After the two sound sources sound, the two acquisition units can acquire the audio signals sent by the two sound sources respectively. Because the distance difference between the two acquisition units and the first sound source is in the first range, the difference of the audio signals of the first sound source acquired by the two acquisition units in the time domain is small, and after the audio signals of the first sound source acquired by the two acquisition units are overlapped, the audio signals can be enhanced, and the amplitude is increased. Because the distance difference between the two acquisition units and the second sound source is larger than the maximum value of the first range, the audio signals of the second sound source acquired by the two acquisition units are larger in difference in time domain, and the amplitude after superposition is not obviously increased. Noise signals with the amplitude smaller than the preset value in the superimposed audio signals can be filtered out according to the preset value, the noise signals comprise audio signals sent by the second sound source, and therefore the audio signals of the second sound are filtered out of the audio signals obtained by the two acquisition units, and the audio signals sent by the first sound source are obtained. The interference of the sound emitted by the second sound source on the audio signal emitted by the first sound source is reduced, the target audio signal of the target acquisition object is obtained, the noise in the target audio signal is reduced, and the quality of the target audio signal is improved, so that the use experience of a user is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic diagram illustrating an audio signal processing method according to an exemplary embodiment;

FIG. 2 is a schematic diagram illustrating a distribution of an audio signal acquisition unit according to an exemplary embodiment;

FIG. 3 is a schematic diagram illustrating a distribution of another audio signal acquisition unit according to an exemplary embodiment;

FIG. 4 is a schematic diagram illustrating one determination of an application scenario in accordance with an exemplary embodiment;

FIG. 5 is a schematic diagram illustrating one determination of an acquisition scene, according to an example embodiment;

FIG. 6 is an interface diagram illustrating a determination of acquisition scenes based on FIG. 5, according to an exemplary embodiment;

fig. 7 is a schematic diagram of an audio signal processing device according to an exemplary embodiment;

Fig. 8 is a schematic diagram illustrating an audio signal processing method according to an exemplary embodiment;

Fig. 9 is a schematic diagram illustrating an audio signal processing method according to an exemplary embodiment;

Fig. 10 is a block diagram of a terminal device according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus consistent with some aspects of the disclosure as detailed in the accompanying claims.

Voice over IP (Voice over Internet Protocol, voIP) calls have gradually replaced SIM card based calls, and VoIP has become a major way of information transfer in everyday life and work, such as online conferences, chat with instant messaging tools, etc. The quality of VoIP call quality is an important evaluation content for the call effect of terminal equipment such as mobile phones or tablets. The conversation quality in the real scene is often influenced by the noisy sound of the surrounding environment, for example, when a user participates in an online conference, the sound of the user is transmitted to the receiving end together with the sound of the surrounding environment and the speaking sound of other people, and the listening feeling of the user at the receiving end is noisy, so that the conversation experience is reduced.

In general, an audio collection unit (such as a microphone) on a terminal device such as a tablet, a mobile phone, etc. can perform 360-degree omnidirectional pickup, but when the device is used in a noisy environment, other sounds in the environment and sounds desired to be collected are collected by the microphone of the terminal device, and even if the noise reduction algorithm of the terminal device suppresses the environmental sounds, the other sounds in the environment are still kept, so that the sounds desired to be collected are disturbed or even covered by the other sounds in the environment.

For example, when a user takes a sound at home and takes a video conference, after 360 degrees of omnidirectional pickup is performed by a microphone, the sound of other families speaking aside is recorded by the microphone and transmitted to the receiving end, so that the sound received by the receiving end is noisy, the user at the receiving end is influenced to acquire the required sound, the conversation quality is reduced, and the privacy of the user is also reduced.

Referring to fig. 1, a schematic diagram of an audio signal processing method may be at least performed in a terminal device having an audio signal acquisition unit, where the terminal device may include a mobile terminal device and a fixed terminal device, i.e., the main body of performing the method may include at least a mobile terminal device and a fixed terminal device. Mobile terminal devices may include cell phones, tablet computers, wearable devices, and the like.

The audio signal processing method includes:

S100, acquiring a first audio signal through a first acquisition unit of the terminal equipment and acquiring a second audio signal through a second acquisition unit of the terminal equipment, wherein the distance difference between the first acquisition unit and the second acquisition unit and a first sound source is within a first range, the distance difference between the first acquisition unit and the second acquisition unit and a second sound source is larger than the maximum value of the first range, and the first sound source is a target acquisition object.

And S200, overlapping the first audio signal and the second audio signal in the time domain to obtain an overlapped audio signal.

S300, filtering noise audio signals with the amplitude smaller than a preset value in the superimposed audio signals to obtain target audio signals of the target acquisition objects.

In the terminal device of this embodiment, the terminal device has a plurality of audio signal acquisition units, simply referred to as acquisition units, the positions of the respective acquisition units being different. For example, the terminal device may include 3, 4 or 5 acquisition units, and the acquisition units may be located on a frame of the terminal device or on a back surface, i.e., a rear shell, of the terminal device. The terminal device may be a terminal device having a display screen, where the back side may be the side of the terminal device opposite the display screen.

For S100, the audio signal pickup unit may include a microphone, which may also be referred to as a pickup.

Illustratively, acquisition units 1 and 2 are located in a first frame of the terminal device, acquisition unit 3 is located in a second frame of the terminal device, and acquisition unit 4 is located on a rear housing of the terminal device. The first frame may be perpendicular to the second frame, and the first frame may be parallel to the second frame.

Referring to fig. 2, which is a schematic distribution diagram of a collection unit, the collection unit is represented by an omni-directional microphone in fig. 2, and fig. 2 is a front view diagram of a terminal device, where the front side is a side where a display screen is located. The omnidirectional microphone mic1 and the omnidirectional microphone mic2 are located at different positions on the same frame of the terminal device. The terminal equipment is provided with a rectangular frame, the omnidirectional microphone mic4 is positioned on the other frame of the terminal equipment, and the two frames are vertical and are two sides of the rectangular frame of the terminal equipment.

Illustratively, the collecting surface of the omnidirectional microphone mic1 and the collecting surface of the omnidirectional microphone mic2 are located on the same frame.

Referring to fig. 3, another schematic distribution diagram of the acquisition unit is shown in fig. 3 by an omni-directional microphone, and fig. 3 is a back view of the terminal device, i.e. a schematic diagram of the rear shell. An omnidirectional microphone mic3 is located on the rear housing.

Illustratively, the collecting face of the omnidirectional microphone mic3 is located on the rear housing.

Also shown in fig. 2 and 3 are speakers, cameras (e.g., proactive), and USB interfaces.

In one embodiment, fig. 2 and fig. 3 are schematic diagrams of the terminal device in a horizontal screen posture, in which fig. 2 is rotated 90 degrees clockwise to be a schematic diagram of the vertical screen posture of the terminal device, and fig. 3 is rotated 90 degrees counterclockwise to be a schematic diagram of the vertical screen posture of the terminal device.

In one embodiment, the terminal device is a tablet computer or a mobile phone. Fig. 2 and 3 are schematic diagrams of tablet computers, for example.

The different two acquisition units are provided with preset distances, and the preset distances can be determined according to the design of the terminal equipment. For example, the omni-directional microphone mic1 and the omni-directional microphone mic2 shown in fig. 2 and 3 have a preset distance therebetween, and the omni-directional microphone mic1 and the omni-directional microphone mic2 may trisect the first frame. The omnidirectional microphone mic4 may be located in the middle of the second rim. The first frame is perpendicular to the second frame.

Illustratively, the first pickup unit may be the omnidirectional microphone mic1 in fig. 2 and 3, and the second pickup unit may be the omnidirectional microphone mic2 in fig. 2 and 3.

The sound emitted by each sound source is respectively collected by two collecting units in a plurality of audio signal collecting units in the terminal equipment, and the two collecting units can be unfixed and can be determined according to actual use scenes.

For example, in the state shown in fig. 2, the terminal device may collect the audio signals sent by the first sound source through the omnidirectional microphone mic1 and the omnidirectional microphone mic2, and collect the audio signals sent by the second sound source through the omnidirectional microphone mic1 and the omnidirectional microphone mic2, respectively. The omnidirectional microphone mic1 can be used as a first acquisition unit, and the omnidirectional microphone mic2 can be used as a second acquisition unit.

For another example, the omnidirectional microphone mic1 and the omnidirectional microphone mic4 collect audio signals of respective sound sources, respectively, the omnidirectional microphone mic1 may be used as a first collecting unit, and the omnidirectional microphone mic4 may be used as a second collecting unit.

For another example, in the state shown in fig. 3, the terminal device may use any one of the omnidirectional microphone mic1, the omnidirectional microphone mic2, and the omnidirectional microphone mic4 as the first acquisition unit, and the omnidirectional microphone mic3 may be used as the second acquisition unit.

The audio signals collected by the first collecting unit are recorded as first audio signals, and the audio signals collected by the second collecting unit are recorded as second audio signals.

For example, if the second sound source is located on the left side or the right side of the terminal device shown in fig. 2, and the second sound source emits the sound 1, both the first collecting unit (such as the omni-directional microphone mic 1) and the second collecting unit (such as the omni-directional microphone mic 2) may collect the sound 1. The sound 1 collected by the first audio signal is recorded as a first audio signal, and the sound 1 collected by the second audio signal is recorded as a second audio signal.

The distance between the first acquisition unit and the second sound source is different, the time for transmitting the sound 1 to the first acquisition unit and the second acquisition unit is different, and the time for receiving the sound 1 by the first acquisition unit and the second acquisition unit is also different.

For another example, if the first sound source faces the front of the terminal device shown in fig. 2 and the distances between the first sound source and the omnidirectional microphone mic1 and the omnidirectional microphone mic2 are equal, the time when the sound 2 emitted by the first sound source is transmitted to the omnidirectional microphone mic1 and the omnidirectional microphone mic2 is the same, and the time when the sound 2 is collected by the omnidirectional microphone mic1 and the omnidirectional microphone mic2 is the same.

The sound source 2 may be, for example, a user of the terminal device.

After a sound source emits sound, each acquisition unit on the terminal device can acquire the sound. After the different sound sources emit the sound, each collecting unit can collect the sound emitted by each sound source. The sound emitted by each sound source is within the acquisition range of the acquisition unit.

In one embodiment, there are two sound sources, a first sound source and a second sound source, and the audio signals emitted by the two sound sources are collected by the first collection unit and the second collection unit. The first sound source is a target acquisition object, and the finally obtained audio signal is an audio signal sent by the first sound source. The second sound source is respectively separated from the first acquisition unit and the second acquisition unit by a distance, and the distance difference between the two distances is larger than the maximum value of the first range.

The first range may be determined according to actual use requirements, and may be a distance range in centimeters or a distance range in millimeters. Specific numerical values are not limited.

The first audio signal can be acquired through the first acquisition unit, the second audio signal can be acquired through the second acquisition unit, and the first audio signal and the second audio signal can be the audio signal of the same sound source. For example, it may be determined from a spectral analysis of the audio signals whether the first audio signal and the second audio signal are audio signals of the same sound source.

The first audio signal and the second audio signal may be audio signals emitted by a first sound source, and the first audio signal and the second audio signal may also be audio signals emitted by a second sound source.

After the first audio signal and the second audio signal are acquired, the spectral information of the first audio signal in the time domain and the spectral information of the second audio signal in the time domain can be determined.

The larger the distance difference between the sound source and the two acquisition units is, the larger the difference in time domain between the two acquisition units acquires the audio signals sent by the sound source is, and the farther the wave peaks of the amplitude are separated. The smaller the distance difference between the sound source and the two acquisition units is, the smaller the difference in time domain between the two acquisition units acquires the audio signals sent by the sound source is, and the closer the wave peaks of the amplitude are.

For S200, after the first audio signal and the second audio signal are obtained, the first audio signal and the second audio signal are superimposed in the time domain, and a superimposed audio signal is obtained. When the difference of the frequency spectrum information of the first audio signal and the second audio signal in the time domain is large, the amplitude overlapping degree of the first audio signal and the second audio signal is low, the amplitude of the overlapped audio signal may be small, and the enhancement effect is small.

When the difference of the frequency spectrum information of the first audio signal and the second audio signal in the time domain is small, the amplitude overlapping degree of the first audio signal and the second audio signal is high, the amplitude of the overlapped audio signal may be large, and the enhancement effect is large. The amplitude of the superimposed audio signal comprises a maximum amplitude.

And for S300, filtering the superimposed audio signal by a preset value, and filtering the audio signal with the amplitude smaller than the preset value of the superimposed audio signal as a noise signal to obtain a target audio signal. Therefore, the sound of the sound source with larger distance difference between the sound source and the two acquisition units, such as the sound emitted by the sound source at the left side or the right side of the terminal equipment or in the direction of the rear shell, is filtered, the audio signal emitted by the target acquisition object is obtained, the noise in the obtained audio signal is reduced, and the quality of the target audio signal is improved.

Because the distance difference between the two acquisition units and the first sound source is in the first range, the difference of the audio signals of the first sound source acquired by the two acquisition units in the time domain is small, and after the audio signals of the first sound source acquired by the two acquisition units are overlapped, the audio signals can be enhanced, and the amplitude is increased. Because the distance difference between the two acquisition units and the second sound source is larger than the maximum value of the first range, the audio signals of the second sound source acquired by the two acquisition units are larger in difference in time domain, and the amplitude after superposition is not obviously increased. Noise signals with the amplitude smaller than the preset value in the superimposed audio signals can be filtered out according to the preset value, the noise signals comprise audio signals sent by the second sound source, and therefore the audio signals of the second sound are filtered out of the audio signals obtained by the two acquisition units, and the audio signals sent by the first sound source are obtained. The interference of the sound emitted by the second sound source on the audio signal emitted by the first sound source is reduced, the target audio signal of the target acquisition object is obtained, the noise in the target audio signal is reduced, and the quality of the target audio signal is improved, so that the use experience of a user is improved.

In one embodiment, the terminal device comprises a rectangular frame and a rear shell, wherein the rectangular frame comprises a group of long sides parallel to each other and a group of short sides parallel to each other, and the long sides and the short sides are connected end to end.

The largest rectangle shown with reference to fig. 2 and 3 is the frame, comprising two long sides parallel to each other and two short sides parallel to each other. The long side and the short side are connected end to form a rectangular frame. Illustratively, two may be a group.

Referring to fig. 2 and 3, the terminal device is in a landscape screen posture, and when the terminal device is in the landscape screen posture, the acquisition surface of the first acquisition unit and the acquisition surface of the second acquisition unit are located at different positions on the same long side. For example, the uppermost frame shown in fig. 2 is a long-side frame, and the collecting surface of the first collecting unit and the second collecting unit are located at different positions of the top frame, that is, the omnidirectional microphone mic1 and the omnidirectional microphone mic2 are located on the same long-side frame.

Illustratively, the omni-directional microphone mic1 and the omni-directional microphone mic2 may also be located on the lowest long-sided rim.

For example, the omnidirectional microphone mic1 and the omnidirectional microphone mic2 may also be located on the frame of the long side and on the frame of the short side.

For example, in an actual application scenario, such as voice communication or video communication using instant messaging software, the terminal device may be in a landscape screen posture.

In one embodiment, the acquisition face of the first acquisition unit and the acquisition face of the second acquisition unit are located on different sides of the rectangular frame when the terminal device is in the portrait orientation.

Illustratively, the collection face of the first collection unit is located on the long side frame and the collection face of the second collection unit is located on the short side frame. The omnidirectional microphone mic1 as in fig. 2 is located on the long side frame and the omnidirectional microphone mic4 on the short side frame.

Illustratively, the collection face of the first collection unit is located on a first long-sided frame and the collection face of the second collection unit is located on another long-sided frame.

In one embodiment, when the terminal device is in an inclined state or a vertical posture, the collecting surface of the first collecting unit is located on the rectangular frame, and the collecting surface of the second collecting unit is located on the rear shell.

Illustratively, the collection face of the first collection unit is located on the rim of either long side and the collection face of the second collection unit is located on the rear housing. The omnidirectional microphone mic1 is located on the long side frame as in fig. 3, and the omnidirectional microphone mic3 is located on the rear case.

In one embodiment, referring to fig. 4, a schematic diagram of determining an application scenario is shown, and the method further includes:

s10, determining an acquisition scene;

s20, setting audio processing parameters of the terminal equipment according to the current acquisition scene, wherein the audio processing parameters are at least used for determining an acquisition unit and/or acquisition parameters for audio acquisition in the acquisition scene.

The acquisition scene comprises a first scene, wherein the first scene is a scene for carrying out audio acquisition on a single sound source;

in a first scenario, audio processing parameters are used at least to determine that the first acquisition unit and the second acquisition unit acquire audio and filter.

In S10, determining an acquisition scene includes:

the acquisition scene is determined according to the input operation acted on the terminal equipment, or according to whether the audio acquisition peripheral equipment connected with the terminal equipment is established.

The acquisition scenes determined by different input operations are different, the audio processing parameters corresponding to different acquisition scenes are different, when the audio processing parameters are different, the acquisition modes of the audio signals acquired by the acquisition units are different, the processing modes of the acquired audio signals are also different, and the acquisition units for acquiring the audio signals are also different. The acquisition mode can be determined according to the acquisition parameters, and when the acquisition parameters are different, the acquisition modes can be different.

The acquisition scene can also be determined according to whether the terminal equipment is connected with the audio acquisition peripheral, and if the terminal equipment is connected with the audio acquisition peripheral, the acquisition scene related to the audio signal acquisition of the audio acquisition peripheral can be determined, such as the scene of the audio signal acquisition through the audio acquisition peripheral.

Referring to fig. 5, an interface diagram for determining an acquisition scene is shown, and referring to fig. 6, an interface diagram for determining an acquisition scene based on fig. 5 is shown.

Fig. 5 is a schematic diagram of an interface in a VoIP-based application, such as an application with conference functionality, in a terminal device, including a conference toolbox interface, which includes a call noise reduction control. After the application is started, the interface may be displayed. The conference tool box may also be an application, for example, and the interface after opening is shown in fig. 5. The interface shown in fig. 6 can be obtained according to the operation on the call noise reduction control.

Fig. 6 is a detailed schematic diagram of the call noise reduction interface shown in fig. 5. Fig. 6 illustrates an interface for determining acquisition scenes, including single person scenes, multi-person scenes, microphone noise reduction scenes, and speaker/earpiece noise reduction scenes, among others. As shown in fig. 6, the call noise reduction interface includes a control that whether microphone noise reduction is on or not and a scene control, and the acquisition scene can be determined according to the input operation. The first scene may include a single scene in which a microphone picks up sound right in front of a screen, the first scene being a scene in which audio collection is performed on a single sound source.

After the single person acquisition scene is selected, in the first scene, the audio processing parameters are at least used for determining that the first acquisition unit and the second acquisition unit acquire audio and filter, and the target audio signal can be determined according to S100 to S300.

In one embodiment, the first scene corresponding audio processing parameters include at least one of:

A preset value for filtering the superimposed audio signal;

The first direction information is used for indicating the sound collection directions of the first collection unit and the second collection unit;

The first noise reduction parameter is used for reducing noise of environmental noise in the first scene;

and the acquisition unit information is used for determining acquisition of the first acquisition unit and the second acquisition unit.

The preset value may be determined according to an actual scene, and may be an integer or a fraction in decibels.

The first direction information may include directions of about 30 degrees each right in front of the terminal device, indicating that the sound collection direction is a range area of 60 degrees total formed by the first direction information indication. Of course, other directions are also possible, indicating other angles from the straight ahead.

The first noise reduction parameters may include noise reduction related parameters such as maximum noise reduction depth, noise reduction coefficient, noise reduction strength, noise reduction radius, and the like.

The acquisition unit information may be identification information of the first acquisition unit and the second acquisition unit.

In one embodiment, the acquisition scene further comprises:

and in the second scene, audio signal acquisition is performed on the plurality of sound sources through at least one audio acquisition unit in the terminal equipment.

The second scene is different from the first scene, and the second scene collects audio signals sent by a plurality of sound sources through one or more collecting units.

Illustratively, the acquisition scene may be determined to be the second scene according to the operation of the multi-person scene acting on the interface shown in fig. 6. The second scene includes the multi-person scene shown in fig. 6.

For example, S100 to S300 may not be required to be performed in the second scenario.

In one embodiment, the audio processing parameters corresponding to the second scene include at least one of:

Acquisition unit information for determining an acquisition unit for acquiring multiple sound sources;

The second direction information is used for collecting the sound of the plurality of sound sources;

Distance information for determining a collection distance for the sound source;

and the second noise reduction parameter is used for reducing noise of the environmental sound of the second scene. Reference may be made to the content comprised by the first noise reduction parameter.

The acquisition unit information may be identification information of an acquisition unit that acquires a plurality of sound sources.

The second direction information may be similar to the first direction information, the specific indicated direction may be different, the second direction information may indicate audio signals emitted by a plurality of sound sources within 360 degrees around the acquisition terminal device, and the acquisition direction is 360 degrees.

The distance information can be determined according to performance parameters and/or call requirements of the acquisition unit, for example, an area with a radius of 5 meters and taking the terminal equipment as a circle center. The maximum acquisition distance is 5 meters.

The audio signal may be collected according to audio processing parameters corresponding to the second scene.

In one embodiment, the acquisition scene further comprises:

and the third scene is used for acquiring audio based on the audio acquisition peripheral independent of the terminal equipment.

Referring to the earphone noise reduction collection mode shown in fig. 6, the collection scene after the switch of the speaker/earphone noise reduction is turned on is a third scene. The third scene is different from the first scene and the second scene, and is not used for collecting audio signals through a collecting unit on the terminal equipment, but is used for collecting audio through an audio collecting peripheral independent of the terminal equipment.

For example, the audio signals around the headphones are collected by a wireless headset that is independent of the terminal device, and the headphones then transmit the collected audio signals to the terminal device.

The first indication information is used for indicating to use an audio acquisition peripheral connected with the terminal equipment to acquire an audio signal;

The third noise reduction parameter is used for indicating to close the environment noise reduction function when the acquisition unit on the terminal equipment is used for audio acquisition;

And the second indication information is used for indicating the acquisition unit of the terminal equipment to acquire the environmental sound.

When the audio processing parameter corresponding to the second scene includes the first indication information, whether to collect the audio signal by using the audio collection peripheral connected with the terminal device can be determined according to the first indication information.

The third noise reduction parameter may refer to what the first noise reduction parameter includes.

When the audio processing parameter corresponding to the second scene includes second indication information, the environmental sound can be collected through the collection unit of the terminal device according to the second indication information.

The first indication information and the second indication information may exist simultaneously, and the audio signal is collected through the audio collection peripheral and the collection unit.

In one embodiment, referring to fig. 7, there is shown a schematic diagram of an audio signal processing apparatus, the apparatus comprising:

The system comprises a first acquisition module 1, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a first audio signal through a first acquisition unit and acquiring a second audio signal through a second acquisition unit, the distance difference between the first acquisition unit and the second acquisition unit and a first sound source is in a first range, the distance difference between the first acquisition unit and the second acquisition unit and a second sound source is larger than the maximum value of the first range, and the first sound source is a target acquisition object;

A superposition module 2, configured to superimpose the first audio signal and the second audio signal in a time domain to obtain a superimposed audio signal;

And the filtering module 3 is used for filtering noise audio signals with the amplitude smaller than a preset value in the superimposed audio signals to obtain target audio signals of the target acquisition objects.

In one embodiment, the terminal device comprises a rectangular frame and a rear shell, wherein the rectangular frame comprises a group of long sides parallel to each other and a group of short sides parallel to each other;

when the terminal equipment is in a horizontal screen posture, the acquisition surface of the first acquisition unit and the acquisition surface of the second acquisition unit are positioned at different positions on the same long side;

When the terminal equipment is in a vertical screen posture, the acquisition surface of the first acquisition unit and the acquisition surface of the second acquisition unit are positioned on different sides of the rectangular frame;

when the terminal equipment is in an inclined state or a vertical posture, the acquisition surface of the first acquisition unit is positioned on the rectangular frame, and the acquisition surface of the second acquisition unit is positioned on the rear shell.

In one embodiment, the apparatus further comprises:

the scene determining module is used for determining an acquisition scene;

The system comprises a parameter determining module, a terminal device and a terminal device, wherein the parameter determining module is used for setting audio processing parameters of the terminal device according to the current acquisition scene, and the audio processing parameters are at least used for determining an acquisition unit and/or acquisition parameters for audio acquisition under the acquisition scene;

And in the first scene, the audio processing parameters are at least used for determining that the first acquisition unit and the second acquisition unit acquire audio and filter the audio.

The preset value;

And the first noise reduction parameter is used for reducing noise of the environmental sound in the first scene.

In one embodiment, the acquisition scene further comprises:

And in the second scene, audio signal acquisition is carried out on a plurality of sound sources through at least one audio acquisition unit in the terminal equipment.

and the second noise reduction parameter is used for reducing noise of the environmental sound of the second scene.

In one embodiment, the acquisition scene further comprises:

and the third scene is used for acquiring audio based on an audio acquisition peripheral independent of the terminal equipment.

the first indication information is used for indicating to use an audio acquisition peripheral connected with the terminal equipment to acquire audio signals;

The third noise reduction parameter is used for indicating to close the environment noise reduction function when the collection unit on the terminal equipment is used for audio collection;

In one embodiment, the scene determination module is further to:

Determining the acquisition scene according to the input operation on the terminal equipment, or

And determining the acquisition scene according to whether an audio acquisition peripheral connected with the terminal equipment exists.

It should be noted that, the "first" and "second" in the embodiments of the present disclosure are merely for convenience of expression and distinction, and are not otherwise specifically meant.

In one embodiment, fig. 8 and 9 are schematic diagrams of two different audio signal processing methods, the conversation tool box is a VoIP-based application, and the ADSP is an advanced digital signal processor. When the VoIP app initiates a call, the pickup mode defaults to a multi-person scenario, and audio signal processing parameters, such as remote_record_mode parameters, are issued to the network audio hardware abstraction layer (AudioHAL). AudioHAL upon receipt of the audio signal processing parameters, the VoIP upstream audio path and corresponding audio signal processing parameters (algorithms in fig. 8 and 9) are configured based on the microphone combinations shown in fig. 2 and 3.

For example, during a conference, if a user needs to suppress surrounding sounds, the user may click on the single scene shown in fig. 7, and at this time, processing parameters of audio signals in the single scene, such as a remote_record_mode parameter, are issued. And AudioHAL after receiving the audio signal processing parameters, adjusting a microphone combination, opening a microphone at the back of the flat plate, and simultaneously modifying the audio signal processing parameters to ensure that the flat plate only carries out directional pickup on the sound with a certain angle range on the front and suppresses the sound at the back and around.

The gyroscope of the terminal is used for sensing whether the current state of the flat panel is a horizontal screen state or a vertical screen state. The data collected by the 4 microphones in total in a single person scene is sent to a voice algorithm for processing, the principle of the algorithm can be referred to as S100 to S300, and the algorithm identical to the principle can be contained in the algorithm. If the horizontal screen mode is currently adopted, sounds on two sides and the back of the panel are restrained according to data of the omnidirectional microphone mic1, the omnidirectional microphone mic2 and the omnidirectional microphone mic 3. If the current mode is the vertical screen mode, only the sound of the side of the long side where the omnidirectional microphone mic1 and the omnidirectional microphone mic2 are located can be restrained, at this time, the audio data of the omnidirectional microphone mic4 on the short side can be read to restrain the sound of the other side, and the pickup effect of the vertical screen mode of the single scene is optimized.

Fig. 10 is a block diagram of a terminal device according to an exemplary embodiment. For example, the terminal device may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 10, a terminal device can include one or more of a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the terminal device, such as operations associated with presentation, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, contact data, phonebook data, messages, pictures, video, etc. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power component 806 provides power to the various components of the terminal device. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal devices.

The multimedia component 808 includes a screen between the terminal device and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the terminal device is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the terminal device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to, a home button, a volume button, an activate button, and a lock button.

The communication component 816 is configured to facilitate communication between the terminal device and other devices, either wired or wireless. The terminal device may access a wireless network based on a communication standard, such as Wi-Fi,4G or 5G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the terminal device may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of audio signal processing, the method comprising:

acquiring a first audio signal through a first acquisition unit of terminal equipment and acquiring a second audio signal through a second acquisition unit of the terminal equipment, wherein the distance difference between the first acquisition unit and the second acquisition unit and a first sound source is within a first range, the distance difference between the first acquisition unit and the second acquisition unit and a second sound source is larger than the maximum value of the first range, and the first sound source is a target acquisition object;

superposing the first audio signal and the second audio signal in a time domain to obtain a superposed audio signal;

and filtering noise audio signals with the amplitude smaller than a preset value in the superimposed audio signals to obtain target audio signals of the target acquisition object.

2. The method of claim 1, wherein the terminal device comprises a rectangular frame and a rear housing, wherein the rectangular frame comprises a set of long sides parallel to each other and a set of short sides parallel to each other;

3. The method according to claim 1 or 2, characterized in that the method further comprises:

determining an acquisition scene;

Setting audio processing parameters of the terminal equipment according to the current acquisition scene, wherein the audio processing parameters are at least used for determining an acquisition unit and/or acquisition parameters for audio acquisition under the acquisition scene;

4. The method of claim 3, wherein the first scene corresponding audio processing parameters comprise at least one of:

The preset value;

5. The method of claim 3, wherein the acquisition scene further comprises:

6. The method of claim 5, wherein the audio processing parameters corresponding to the second scene include at least one of:

7. The method of claim 3, wherein the acquisition scene further comprises:

8. The method of claim 7, wherein the audio processing parameters corresponding to the second scene include at least one of:

9. A method according to claim 3, wherein said determining an acquisition scene comprises:

10. An audio signal processing apparatus, comprising:

The system comprises a first acquisition unit, a second acquisition unit, a first sound source, a second sound source, a first sound source and a second sound source, wherein the first acquisition unit is used for acquiring a first audio signal and acquiring a second audio signal through the first acquisition unit, and the second acquisition unit is used for acquiring a second audio signal through the second acquisition unit;

The superposition module is used for superposing the first audio signal and the second audio signal in a time domain to obtain a superposed audio signal;

And the filtering module is used for filtering noise audio signals with the amplitude smaller than a preset value in the superimposed audio signals to obtain target audio signals of the target acquisition objects.

11. A terminal device, comprising:

A processor and a memory for storing executable instructions capable of executing on the processor, wherein:

a processor for executing the executable instructions, which when executed perform the method of any of the preceding claims 1 to 9.

12. A non-transitory computer readable storage medium having stored therein computer executable instructions which when executed by a processor implement the method of any one of the preceding claims 1 to 9.