Movatterモバイル変換


[0]ホーム

URL:


CN113327626B - Voice noise reduction method, device, equipment and storage medium - Google Patents

Voice noise reduction method, device, equipment and storage medium
Download PDF

Info

Publication number
CN113327626B
CN113327626BCN202110699792.6ACN202110699792ACN113327626BCN 113327626 BCN113327626 BCN 113327626BCN 202110699792 ACN202110699792 ACN 202110699792ACN 113327626 BCN113327626 BCN 113327626B
Authority
CN
China
Prior art keywords
voice
recognition model
scene
scene recognition
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110699792.6A
Other languages
Chinese (zh)
Other versions
CN113327626A (en
Inventor
汪雪
黄石磊
程刚
何竹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Raisound Technology Co ltd
Original Assignee
Shenzhen Raisound Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Raisound Technology Co ltdfiledCriticalShenzhen Raisound Technology Co ltd
Priority to CN202110699792.6ApriorityCriticalpatent/CN113327626B/en
Publication of CN113327626ApublicationCriticalpatent/CN113327626A/en
Application grantedgrantedCritical
Publication of CN113327626BpublicationCriticalpatent/CN113327626B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The application relates to the technical field of audio processing, and discloses a voice noise reduction method, which comprises the following steps: the method comprises the steps of obtaining voice data, inputting the voice data into a preset standard scene recognition model, determining a voice scene corresponding to the voice data, wherein the standard scene recognition model is obtained by training according to noise sample sets in various scenes, selecting a preset noise reduction model corresponding to the voice scene, and reducing noise of the voice data. In addition, the application also discloses a voice noise reduction method, a device, equipment and a storage medium. The application can improve the accuracy of voice noise reduction.

Description

Voice noise reduction method, device, equipment and storage medium
Technical Field
The present application relates to the field of audio processing technologies, and in particular, to a method and apparatus for voice noise reduction, and a storage medium.
Background
In the present life, various noise audio data such as noise audio data at the roadside of a horse, noise audio data in a park, noise audio data of an office and the like are filled, the audio characteristics of the noise audio data in different voice scenes are different, and simultaneously, correspondingly adopted voice noise reduction means are also different.
The existing voice noise reduction method is based on noise reduction of voice print characteristics of a user, for example, the voice of the user in voice data is enhanced according to the voice print characteristics of the user, so that background noise is weakened, and voice noise reduction is completed. However, in the practical application scene, when the volume of the background noise is too large, the method cannot realize weakening of the background noise according to enhancement of the user voice, so that the noise reduction accuracy is not high.
Disclosure of Invention
In order to solve the technical problems described above or at least partially solve the technical problems described above, the present application provides a method, an apparatus and a storage medium for voice noise reduction.
In a first aspect, the present application provides a method for noise reduction in speech, the method comprising:
acquiring voice data;
inputting the voice data into a preset standard scene recognition model, and determining a voice scene corresponding to the voice data, wherein the standard scene recognition model is obtained by training according to noise sample sets in various scenes;
and selecting a preset noise reduction model corresponding to the voice scene, and carrying out noise reduction on the voice data.
In one embodiment of the first aspect, before the step of obtaining the voice data, the method includes:
collecting noise sample sets in each scene, and extracting audio features from each noise sample;
performing cluster analysis on the noise sample set based on the audio features to obtain a classified speech set;
dividing the classified voice set into a training voice set and a test voice set, constructing the scene recognition model by using the training voice set, and performing test adjustment on the scene recognition model by using the test voice set to obtain a standard scene recognition model.
In one embodiment of the first aspect, the step of segmenting the classified speech set into a training speech set and a test speech set, constructing the scene recognition model by using the training speech set, and performing test adjustment on the scene recognition model by using the test speech set to obtain a standard scene recognition model further includes:
and establishing a noise reduction model corresponding to each scene for calling according to the collected noise sample set in each scene.
In one embodiment of the first aspect, the constructing a scene recognition model by using the training speech set includes:
calculating a base index between each feature label and the corresponding training voice set to obtain a base index set corresponding to the feature label, wherein the feature label is a category label which is extracted from a noise sample set in each scene to obtain corresponding audio features;
sorting the base index sets according to the sequence from big to small, and selecting marks corresponding to the smallest base index in the base index sets as dividing points;
taking the segmentation point as a root node of an initial decision tree, generating sub-nodes from the segmentation point, and distributing the training voice set to the sub-nodes until all labels in the feature labels are traversed, so as to generate the initial decision tree;
pruning is carried out on the initial decision tree, and a scene recognition model is obtained.
In one embodiment of the first aspect, pruning the initial decision tree to obtain a scene recognition model includes:
calculating surface error gain values of all non-leaf nodes on the initial decision tree;
pruning the non-leaf nodes with the surface error gain values smaller than a preset gain threshold value to obtain a scene recognition model.
In one embodiment of the first aspect, the performing test adjustment on the scene recognition model by using the test voice set to obtain a standard scene recognition model includes:
performing scene recognition processing on the test voice set by using the scene recognition model to obtain a recognition result corresponding to the test voice set;
and when the recognition result corresponding to the test voice set is inconsistent with the feature label corresponding to the test voice set, training the scene recognition model by utilizing the training voice set again until the recognition result corresponding to the test voice set is consistent with the feature label corresponding to the test voice set, and obtaining a standard scene recognition model.
In one embodiment of the first aspect, the performing cluster analysis on the noise sample set based on the audio feature to obtain a classified speech set includes:
acquiring preset standard features, and calculating a conditional probability value between the audio features and the standard features;
and sorting each noise sample in the noise sample set according to the size of the conditional probability value, and dividing the sorted noise sample set by taking a preset audio interval as a dividing point to obtain a classified voice set.
In one embodiment of the first aspect, collecting a set of noise samples in each scene, and extracting audio features from each noise sample includes:
pre-emphasis processing, framing processing, windowing processing and fast Fourier transformation are carried out on the noise sample set, so that a short-time frequency spectrum of the noise sample set is obtained;
taking the modulus square of the short-time frequency spectrum to obtain the power spectrum of the noise sample set;
and calculating the power spectrum by using a triangle filter bank with a preset Mel scale to obtain logarithmic energy, and performing discrete cosine transform on the logarithmic energy to obtain the audio characteristics corresponding to each noise sample.
In a second aspect, the present application provides a speech scene recognition apparatus, the apparatus comprising:
the voice data acquisition module is used for acquiring voice data;
the voice scene recognition module is used for inputting the voice data into a preset standard scene recognition model, and determining a voice scene corresponding to the voice data, wherein the standard scene recognition model is obtained by training according to noise sample sets in various scenes;
the noise reduction module is used for selecting a preset noise reduction model corresponding to the voice scene and reducing noise of the voice data.
In a third aspect, a speech recognition device is provided, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory perform communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the steps of the voice noise reduction method according to any embodiment of the first aspect when executing the program stored in the memory.
In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, implements the steps of the speech noise reduction method according to any one of the embodiments of the first aspect.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
according to the embodiment of the application, the acquired voice data is input into the preset standard scene recognition model, the voice scene corresponding to the voice data is recognized by utilizing the standard scene recognition model, the voice environment where the voice data is positioned can be determined by recognizing the voice scene corresponding to the voice data, the preset noise reduction model corresponding to the voice scene is selected, the voice data is subjected to noise reduction, and the noise reduction operation is performed more accurately by the noise reduction model matched with the scene, so that the purpose of improving the accuracy of voice noise reduction is achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a schematic flow chart of a voice noise reduction method according to an embodiment of the present application;
fig. 2 is a schematic flow chart of test adjustment for a scene recognition model in a voice noise reduction method according to an embodiment of the present application;
fig. 3 is a schematic block diagram of a device for voice noise reduction according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device for voice noise reduction according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Fig. 1 is a flow chart of a voice noise reduction method according to an embodiment of the present application. In this embodiment, the voice noise reduction method includes:
s1, acquiring voice data.
In the embodiment of the application, the voice data is noise-containing audio data to be subjected to noise reduction processing so as to be subjected to subsequent audio processing such as voice recognition and the like. Specifically, the voice data may be audio data collected in any voice scene.
Further, before the step of acquiring the voice data, the method includes:
collecting noise sample sets in each scene, and extracting audio features from each noise sample;
performing cluster analysis on the noise sample set based on the audio features to obtain a classified speech set;
dividing the classified voice set into a training voice set and a test voice set, constructing the scene recognition model by using the training voice set, and performing test adjustment on the scene recognition model by using the test voice set to obtain a standard scene recognition model.
In detail, in the embodiment of the present application, the noise sample set includes noise audio data in each voice scene, for example, noise audio data in a park, noise audio data at a roadside of a horse, or noise audio data in an office. In the embodiment of the present application, the noise sample set may further include feature labels corresponding to each noise sample, where the feature labels are used to label each noise sample and extract corresponding audio features. The audio features may include zero-crossing rate, mel-frequency cepstrum coefficient, spectrum centroid, spectrum diffusion, spectrum entropy, spectrum flux, and the like, wherein the audio features in the embodiment of the application are preferably mel-frequency cepstrum coefficient.
Specifically, the collecting the noise sample set under each scene, extracting the audio feature from each noise sample includes:
pre-emphasis processing, framing processing, windowing processing and fast Fourier transformation are carried out on the noise sample set, so that a short-time frequency spectrum of the noise sample set is obtained;
taking the modulus square of the short-time frequency spectrum to obtain the power spectrum of the noise sample set;
and calculating the power spectrum by using a triangle filter bank with a preset Mel scale to obtain logarithmic energy, and performing discrete cosine transform on the logarithmic energy to obtain the audio characteristics corresponding to each noise sample.
In an alternative embodiment of the present application, the noise sample set is subjected to pre-emphasis processing by a preset high-pass filter, so as to obtain a high-frequency noise sample set, where the pre-emphasis processing can enhance the high-frequency part of the voice signal in the noise sample set.
The embodiment of the application carries out pre-emphasis processing on the noise sample set, and can highlight the formants of the high-frequency part in the noise sample.
In an optional embodiment of the present application, the high-frequency noise sample set is segmented into multi-frame data by using a preset sampling point, so as to obtain a framing data set;
preferably, in the embodiment of the present application, the sampling point is 512 or 256.
In an optional embodiment of the present application, the windowing process is performed on each frame in the frame data set according to a preset window function, so as to obtain a windowed signal.
In detail, the preset window function is:
S′(n)=S(n)×W(n)
where S' (N) is a windowed signal, S (N) is a framing dataset, W (N) is a window function, N is the size of the frame, and N is the number of frames.
Preferably, in the embodiment of the present application, the preset window function may select a hamming window, and W (n) is a functional expression of the hamming window.
According to the embodiment of the application, the windowing processing is carried out on the framing data set, so that the continuity of the left end of the frame and the right end of the frame can be increased, and the frequency spectrum leakage is reduced.
Further, the embodiment of the application adopts the following formula to execute the fast Fourier transform, and comprises the following steps:
and
The short-time spectrum is modulo-squared using the following formula:
where S (k) is a short-time spectrum, p (k) is a power spectrum, S' (N) is a windowed signal, N is the size of a frame, N is the number of frames, and k is a preset parameter on the short-time spectrum.
Since the transformation of a signal in the time domain is often difficult to see the characteristics of the signal, embodiments of the present application transform the set of noise samples into energy distributions in the frequency domain, different energy distributions may represent the characteristics of different voices.
Further, in an embodiment of the present application, the triangular filter bank with Mel (Mel) scale is:
wherein T (m) is logarithmic energy, p (k) is power spectrum, H (k) is frequency response of the triangular filter, N is frame size, and k is preset parameter on short-time frequency spectrum.
According to the embodiment of the application, the triangular filter is utilized to calculate the logarithmic energy of the power spectrum, so that the short-time spectrum can be smoothed, the harmonic wave is eliminated, and the formants in the voice information are highlighted.
Specifically, the performing cluster analysis on the noise sample set based on the audio feature to obtain a classified speech set includes:
acquiring preset standard features, and calculating correlation coefficients between the audio features and the standard features;
and sorting each noise sample in the noise sample set according to the size of the correlation coefficient, and dividing the sorted noise sample set by taking a preset audio interval as a dividing point to obtain a classified voice set.
Wherein the classified speech set includes speech in different scenes, for example, speech in a road scene, speech in a park scene, and the like.
In detail, calculating a correlation coefficient between the audio feature corresponding to each noise sample in the noise sample set and the standard feature using the following formula includes:
wherein qij For the correlation coefficient, yi For the corresponding audio features of the noise samples, yj For the standard feature, exp is an exponential function, yk And yl Is a fixed parameter.
Specifically, the clustering analysis is performed on the original noise sample set, namely, noise samples distributed in a high-dimensional space are embedded into a certain low-dimensional subspace, and data in the low-dimensional space are kept consistent with characteristics in the high-dimensional space as much as possible. The clustering analysis can keep the advantage of the global clustering characteristic of the high dimension data in the low dimension space, and the clustering relation of various noise samples is subjected to visual analysis, so that the noise samples with similar time-frequency domain characteristics are classified into one type for classification and identification, and the identification accuracy is improved.
Further, the classified speech set is divided into a training speech set and a test speech set, the training speech set is utilized to construct the scene recognition model, and the test speech set is utilized to test and adjust the scene recognition model, so that a standard scene recognition model is obtained. And segmenting the classified voice set according to a preset dividing proportion to obtain a training voice set and a test voice set.
Preferably, the dividing ratio is a training speech set: test speech set = 7:3.
the training voice set can be used for subsequent model training and is a sample for model fitting, and the test voice set can be used for adjusting super parameters of the model and carrying out preliminary assessment on the capacity of the model, and is particularly used for assessing the generalization capacity of the model.
Specifically, the constructing a scene recognition model by using the training voice set includes:
calculating a base index between each feature label and the corresponding training voice set to obtain a base index set corresponding to the feature label, wherein the feature label is a category label which is extracted from a noise sample set in each scene to obtain corresponding audio features;
sorting the base index sets according to the sequence from big to small, and selecting marks corresponding to the smallest base index in the base index sets as dividing points;
taking the segmentation point as a root node of an initial decision tree, generating sub-nodes from the segmentation point, and distributing the training voice set to the sub-nodes until all labels in the feature labels are traversed, so as to generate the initial decision tree;
pruning is carried out on the initial decision tree, and a scene recognition model is obtained.
Specifically, the calculating a base index between each feature label and the corresponding training dataset includes:
calculating the base index between each feature label and the training voice set corresponding to the feature label by using the following base index:
wherein Gini (p) is a base index, pk K represents the kth frame data in the training speech set, K being the number of frames in the training speech set.
In detail, the base index represents the model's unreliability, and the smaller the base index, the lower the unreliability, the better the description feature.
Further, the pruning processing is performed on the initial decision tree to obtain a scene recognition model, which includes:
calculating surface error gain values of all non-leaf nodes on the initial decision tree;
pruning the non-leaf nodes with the surface error gain values smaller than a preset gain threshold value to obtain a scene recognition model.
In this embodiment of the present application, the preset gain threshold is 0.5.
Further, the calculating the surface error gain values of all non-leaf nodes on the initial decision tree includes:
calculating the surface error gain values of all non-leaf nodes on the initial decision tree by using the following gain formula:
R(t)=r(t)*p(t)
wherein alpha represents a surface error gain value, R (T) represents an error cost of a leaf node, R (T) represents an error cost of a non-leaf node, N (T) represents the number of nodes of the initial decision tree, R (T) represents an error rate of the leaf node, and p (T) represents the ratio of the number of the leaf nodes to the number of all nodes.
Specifically, referring to fig. 2, the performing test adjustment on the scene recognition model by using the test voice set to obtain a standard scene recognition model includes:
s101, performing scene recognition processing on the test voice set by using the scene recognition model to obtain a recognition result corresponding to the test voice set;
s102, when the recognition result corresponding to the test voice set is inconsistent with the feature label corresponding to the test voice set, training the scene recognition model by utilizing the training voice set again until the recognition result corresponding to the test voice set is consistent with the feature label corresponding to the test voice set, and obtaining a standard scene recognition model.
Further, the step of segmenting the classified speech set into a training speech set and a test speech set, constructing the scene recognition model by using the training speech set, and performing test adjustment on the scene recognition model by using the test speech set to obtain a standard scene recognition model further comprises the following steps:
and establishing a noise reduction model corresponding to each scene for calling according to the collected noise sample set in each scene.
S2, inputting the voice data into a preset standard scene recognition model, and determining a voice scene corresponding to the voice data, wherein the standard scene recognition model is obtained by training according to noise sample sets in various scenes.
In the embodiment of the application, the acquired voice data is input into the preset standard scene recognition model, the preset standard scene recognition model carries out scene recognition processing on the voice data, and a voice scene corresponding to the voice data is output.
S3, selecting a preset noise reduction model corresponding to the voice scene, and carrying out noise reduction on the voice data.
In the embodiment of the application, the noise reduction model comprises a dynamic time regular model, a vector quantization model, a hidden Markov model and the like, and according to the voice scene corresponding to the voice data and the characteristics of the noise reduction model, the corresponding noise reduction model is selected to execute noise reduction operation on the voice data, so as to obtain a noise reduction result.
According to the embodiment of the application, the acquired voice data is input into the preset standard scene recognition model, the voice scene corresponding to the voice data is recognized by utilizing the standard scene recognition model, the voice environment where the voice data is positioned can be determined by recognizing the voice scene corresponding to the voice data, the preset noise reduction model corresponding to the voice scene is selected, the voice data is subjected to noise reduction, and the accuracy of voice noise reduction is improved.
As shown in fig. 3, an embodiment of the present application provides a schematic block diagram of a speech noise reduction device 10, where the speech noise reduction device 10 includes: a voice data acquisition module 11, a voice scene recognition module 12 and a noise reduction module 13.
The voice data acquisition module 11 is configured to acquire voice data;
the voice scene recognition module 12 is configured to input the voice data into a preset standard scene recognition model, and determine a voice scene corresponding to the voice data, where the standard scene recognition model is obtained by training according to a noise sample set in each scene;
the noise reduction module 13 is configured to select a preset noise reduction model corresponding to the voice scene, and perform noise reduction on the voice data.
In detail, each module in the voice noise reduction device 10 in the embodiment of the present application adopts the same technical means as the voice noise reduction method described in fig. 1 and can produce the same technical effects, which are not described herein.
As shown in fig. 4, an embodiment of the present application provides a voice noise reduction apparatus, which includes a processor 111, a communication interface 112, a memory 113, and a communication bus 114, wherein the processor 111, the communication interface 112, and the memory 113 perform communication with each other through the communication bus 114,
a memory 113 for storing a computer program;
in one embodiment of the present application, the processor 111 is configured to implement the voice noise reduction method provided in any one of the foregoing method embodiments when executing the program stored in the memory 113, and includes:
acquiring voice data;
inputting the voice data into a preset standard scene recognition model, and determining a voice scene corresponding to the voice data, wherein the standard scene recognition model is obtained by training according to noise sample sets in various scenes;
and selecting a preset noise reduction model corresponding to the voice scene, and carrying out noise reduction on the voice data.
The communication bus 114 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industrial Standard Architecture (EISA) bus, or the like. The communication bus 114 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface 112 is used for communication between the above-described electronic device and other devices.
The memory 113 may include a Random Access Memory (RAM) or a nonvolatile memory (non-volatile memory), such as at least one disk memory. Alternatively, the memory 113 may be at least one memory device located remotely from the processor 111.
The processor 111 may be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSP), application Specific Integrated Circuits (ASIC), field-programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the voice noise reduction method provided in any one of the method embodiments described above.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.) means from one website, computer, server, or data center. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. Usable media may be magnetic media (e.g., floppy disks, hard disks, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., solid state disk SolidStateDisk (SSD)), among others. It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

CN202110699792.6A2021-06-232021-06-23Voice noise reduction method, device, equipment and storage mediumActiveCN113327626B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110699792.6ACN113327626B (en)2021-06-232021-06-23Voice noise reduction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110699792.6ACN113327626B (en)2021-06-232021-06-23Voice noise reduction method, device, equipment and storage medium

Publications (2)

Publication NumberPublication Date
CN113327626A CN113327626A (en)2021-08-31
CN113327626Btrue CN113327626B (en)2023-09-08

Family

ID=77424416

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110699792.6AActiveCN113327626B (en)2021-06-232021-06-23Voice noise reduction method, device, equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN113327626B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113516985A (en)*2021-09-132021-10-19北京易真学思教育科技有限公司Speech recognition method, apparatus and non-volatile computer-readable storage medium
CN113793620B (en)*2021-11-172022-03-08深圳市北科瑞声科技股份有限公司Voice noise reduction method, device and equipment based on scene classification and storage medium
CN114121023A (en)*2021-11-302022-03-01深港产学研基地(北京大学香港科技大学深圳研修院)Speaker separation method, speaker separation device, electronic equipment and computer readable storage medium
CN114666092B (en)*2022-02-162025-07-25奇安信科技集团股份有限公司Real-time behavior security baseline data noise reduction method and device for security analysis
CN114566160B (en)*2022-03-012025-04-18游密科技(深圳)有限公司 Voice processing method, device, computer equipment, and storage medium
CN114333881B (en)*2022-03-092022-05-24深圳市迪斯声学有限公司Audio transmission noise reduction method, device and medium based on environment self-adaptation
CN114974279B (en)*2022-05-102024-10-25中移(杭州)信息技术有限公司Sound quality control method, device, equipment and storage medium
CN115331689B (en)*2022-08-112025-02-28北京声智科技有限公司 Training method, device, equipment, storage medium and product of speech noise reduction model
CN115831138A (en)*2022-09-302023-03-21联想(北京)有限公司Audio information processing method and device and electronic equipment
CN115767389A (en)*2022-11-042023-03-07西安讯飞超脑信息科技有限公司Audio signal processing method for digital hearing aid and digital hearing aid
CN116758934B (en)*2023-08-182023-11-07深圳市微克科技有限公司 A method, system and medium for realizing the intercom function of smart wearable devices
CN116994599A (en)*2023-09-132023-11-03湖北星纪魅族科技有限公司Audio noise reduction method for electronic equipment, electronic equipment and storage medium
CN117202071B (en)*2023-09-212024-03-29广东金海纳实业有限公司 A testing method and system for noise reduction headphones

Citations (15)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101710490A (en)*2009-11-202010-05-19安徽科大讯飞信息科技股份有限公司Method and device for compensating noise for voice assessment
KR20120129421A (en)*2011-05-202012-11-28고려대학교 산학협력단Apparatus and method of speech recognition for number
US8438029B1 (en)*2012-08-222013-05-07Google Inc.Confidence tying for unsupervised synthetic speech adaptation
CN106611183A (en)*2016-05-302017-05-03四川用联信息技术有限公司Method for constructing Gini coefficient and misclassification cost-sensitive decision tree
KR20180046062A (en)*2016-10-272018-05-08에스케이텔레콤 주식회사Method for speech endpoint detection using normalizaion and apparatus thereof
CN108181107A (en)*2018-01-122018-06-19东北电力大学The Wind turbines bearing mechanical method for diagnosing faults of meter and more class objects
CN108198547A (en)*2018-01-182018-06-22深圳市北科瑞声科技股份有限公司 Voice endpoint detection method, device, computer equipment and storage medium
CN109285538A (en)*2018-09-192019-01-29宁波大学 A mobile phone source identification method based on constant-Q transform domain in additive noise environment
WO2019237519A1 (en)*2018-06-112019-12-19平安科技(深圳)有限公司General vector training method, voice clustering method, apparatus, device and medium
CN110769111A (en)*2019-10-282020-02-07珠海格力电器股份有限公司Noise reduction method, system, storage medium and terminal
CN111754988A (en)*2020-06-232020-10-09南京工程学院 Acoustic scene classification method based on attention mechanism and dual-path deep residual network
CN111916066A (en)*2020-08-132020-11-10山东大学Random forest based voice tone recognition method and system
CN111933175A (en)*2020-08-062020-11-13北京中电慧声科技有限公司Active voice detection method and system based on noise scene recognition
CN112614504A (en)*2020-12-222021-04-06平安科技(深圳)有限公司Single sound channel voice noise reduction method, system, equipment and readable storage medium
CN112863667A (en)*2021-01-222021-05-28杭州电子科技大学Lung sound diagnosis device based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9619035B2 (en)*2011-03-042017-04-11Microsoft Technology Licensing, LlcGesture detection and recognition
US10515629B2 (en)*2016-04-112019-12-24Sonde Health, Inc.System and method for activation of voice interactive services based on user state

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101710490A (en)*2009-11-202010-05-19安徽科大讯飞信息科技股份有限公司Method and device for compensating noise for voice assessment
KR20120129421A (en)*2011-05-202012-11-28고려대학교 산학협력단Apparatus and method of speech recognition for number
US8438029B1 (en)*2012-08-222013-05-07Google Inc.Confidence tying for unsupervised synthetic speech adaptation
CN106611183A (en)*2016-05-302017-05-03四川用联信息技术有限公司Method for constructing Gini coefficient and misclassification cost-sensitive decision tree
KR20180046062A (en)*2016-10-272018-05-08에스케이텔레콤 주식회사Method for speech endpoint detection using normalizaion and apparatus thereof
CN108181107A (en)*2018-01-122018-06-19东北电力大学The Wind turbines bearing mechanical method for diagnosing faults of meter and more class objects
CN108198547A (en)*2018-01-182018-06-22深圳市北科瑞声科技股份有限公司 Voice endpoint detection method, device, computer equipment and storage medium
WO2019237519A1 (en)*2018-06-112019-12-19平安科技(深圳)有限公司General vector training method, voice clustering method, apparatus, device and medium
CN109285538A (en)*2018-09-192019-01-29宁波大学 A mobile phone source identification method based on constant-Q transform domain in additive noise environment
CN110769111A (en)*2019-10-282020-02-07珠海格力电器股份有限公司Noise reduction method, system, storage medium and terminal
CN111754988A (en)*2020-06-232020-10-09南京工程学院 Acoustic scene classification method based on attention mechanism and dual-path deep residual network
CN111933175A (en)*2020-08-062020-11-13北京中电慧声科技有限公司Active voice detection method and system based on noise scene recognition
CN111916066A (en)*2020-08-132020-11-10山东大学Random forest based voice tone recognition method and system
CN112614504A (en)*2020-12-222021-04-06平安科技(深圳)有限公司Single sound channel voice noise reduction method, system, equipment and readable storage medium
CN112863667A (en)*2021-01-222021-05-28杭州电子科技大学Lung sound diagnosis device based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向人机交互的语音情感识别与文本敏感词检测;涂晴宇;《 中国优秀硕士学位论文全文数据库 (信息科技辑)》;I136-201*

Also Published As

Publication numberPublication date
CN113327626A (en)2021-08-31

Similar Documents

PublicationPublication DateTitle
CN113327626B (en)Voice noise reduction method, device, equipment and storage medium
CN111179975B (en)Voice endpoint detection method for emotion recognition, electronic device and storage medium
CN109308912B (en)Music style recognition method, device, computer equipment and storage medium
CN113646833B (en) Speech adversarial sample detection method, device, equipment and computer-readable storage medium
CN108922543B (en)Model base establishing method, voice recognition method, device, equipment and medium
CN107993663A (en)A kind of method for recognizing sound-groove based on Android
CN113223536B (en)Voiceprint recognition method and device and terminal equipment
CN109065071B (en)Song clustering method based on iterative k-means algorithm
CN108198547A (en) Voice endpoint detection method, device, computer equipment and storage medium
CN112908344B (en)Intelligent bird song recognition method, device, equipment and medium
CN103729368B (en)A kind of robust audio recognition methods based on local spectrum iamge description
WO2019227574A1 (en)Voice model training method, voice recognition method, device and equipment, and medium
CN114446284B (en) Speaker log generation method, device, computer equipment and readable storage medium
CN109065043B (en) A kind of command word recognition method and computer storage medium
CN115083423A (en) Data processing method and device for voice identification
CN112309404B (en)Machine voice authentication method, device, equipment and storage medium
CN111326161B (en)Voiceprint determining method and device
CN117312548A (en)Multi-source heterogeneous disaster situation data fusion understanding method
CN115331703A (en) Method and device for detecting vocals in songs
CN111933153B (en)Voice segmentation point determining method and device
CN113129926A (en)Voice emotion recognition model training method, voice emotion recognition method and device
CN117976003A (en)Cross-modal emotion analysis method, training method, device and equipment
CN114974267A (en) Bird language classification model training method and bird language recognition method
CN112820267B (en)Waveform generation method, training method of related model, related equipment and device
CN114171059B (en) Noise identification method, device, electronic device and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp