Movatterモバイル変換


[0]ホーム

URL:


CN116863956A - Robust snore detection method and system based on convolutional neural network - Google Patents

Robust snore detection method and system based on convolutional neural network
Download PDF

Info

Publication number
CN116863956A
CN116863956ACN202310915532.7ACN202310915532ACN116863956ACN 116863956 ACN116863956 ACN 116863956ACN 202310915532 ACN202310915532 ACN 202310915532ACN 116863956 ACN116863956 ACN 116863956A
Authority
CN
China
Prior art keywords
snore
neural network
audio
residual
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310915532.7A
Other languages
Chinese (zh)
Inventor
刘鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chipintelli Technology Co Ltd
Original Assignee
Chipintelli Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chipintelli Technology Co LtdfiledCriticalChipintelli Technology Co Ltd
Priority to CN202310915532.7ApriorityCriticalpatent/CN116863956A/en
Publication of CN116863956ApublicationCriticalpatent/CN116863956A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

A robust snore detection method and system based on convolutional neural network, the method includes the following steps: s1, constructing a snore data set comprising a far-field environment; s2, extracting the mel cepstrum coefficient characteristics in the snore data set as a training sample; s3, taking the mel cepstrum coefficient characteristic as input, and referring to an ECAPA-TDNN residual convolution neural network structure to extract the audio deep characteristic; s4, inputting the audio deep features obtained in the step S3 into a two-classifier to judge whether the audio deep features are target snores or not, and performing iterative training to obtain a snore detection model; s5, detecting the environmental audio in real time by using the trained snore detection model. The snore detection method disclosed by the invention utilizes the characteristic extraction of the lightweight residual neural network to construct a lightweight snore detection model with robustness, so that the detection rate of the current detection method under the middle-long distance is effectively improved.

Description

Robust snore detection method and system based on convolutional neural network
Technical Field
The invention belongs to the technical field of voice signal processing, and particularly relates to a robust snore detection method and system based on a convolutional neural network.
Background
Sleeping snoring affects sleeping quality, causes symptoms such as physical fatigue, drowsiness and the like, and simultaneously can induce health problems such as hypertension, heart disease, diabetes and the like, and the demands for good sleeping are gradually increased along with the increasing importance of people on physical health in the society nowadays, so that snore detection has larger application demands in the fields such as intelligent wearing equipment, intelligent home, medical diagnosis and the like.
With the rise of artificial intelligence, a deep learning mode is adopted to analyze snore signals to become a research hot spot, and the deep audio features are extracted through a convolutional neural network or a cyclic neural network, so that a good detection effect is achieved; however, when the model is embedded, the real-time processing and analysis may not be effective due to the fact that the calculation force is tight and the memory is insufficient, so that the detection accuracy is affected; in addition, as market demands increase, unlike sleeping pillows or wearable devices, in some application scenarios of smart home such as smart electric beds, there is a case that a microphone is far away from a target sound source, and the detection accuracy may be drastically reduced due to the too far distance; therefore, a model and method with high light weight and generalization are needed to solve the problems.
Disclosure of Invention
Aiming at the problems, the invention discloses a robust snore detection method based on a convolutional neural network, which aims to solve the problems of too large model size and poor stability:
the invention relates to a robust snore detection method based on a convolutional neural network, which comprises the following steps:
s1, constructing a snore data set comprising a far-field environment;
s2, extracting the mel cepstrum coefficient characteristics in the snore data set as a training sample;
s3, taking the mel cepstrum coefficient characteristic as input, and referring to an ECAPA-TDNN residual convolution neural network structure to extract the audio deep characteristic;
the ECAPA-TDNN residual convolutional neural network comprises a one-dimensional convolutional coding layer, a plurality of one-dimensional residual excitation network layers, a characteristic fusion layer, an attention statistic pooling layer and a linear layer which are connected in sequence; the output ends of the one-dimensional convolution coding layers are connected with the input ends of the one-dimensional residual excitation network layers, and the output ends of the one-dimensional residual excitation network layers are connected with the input ends of the feature fusion layer;
s4, inputting the audio deep features obtained in the step S3 into a two-classifier to judge whether the audio deep features are target snores, iteratively training the residual convolution neural network by taking the mel cepstrum coefficient features corresponding to the input audio deep features as training targets, updating network parameters of the residual convolution neural network in the training process, stopping training after reaching convergence conditions, and storing the residual convolution neural network to obtain a snore detection model;
s5, detecting the environmental audio in real time by using the trained snore detection model.
Preferably, the step S1 specifically includes collecting original snore data, arranging the original snore data into fixed audios with the same duration, randomly selecting room impact response audios, adding reverberation to each fixed audio to expand audio length, randomly reducing the volume of snore audios in the original snore data, and regenerating a snore data set for training.
Preferably, in the step S2, short-time fourier processing is performed on audio in the snore data set to obtain a power spectrum, filtering is performed on the power spectrum by using a mel filter bank, and finally discrete cosine transformation is performed on the filtered power spectrum after logarithm is taken to obtain the mel cepstrum coefficient characteristic.
Preferably, the mel cepstrum coefficient feature obtained in the step S2 is stored as a binary file with an extension of ". Bin", the binary file is labeled, and the labeling process is that the feature file corresponding to the snore audio is labeled as 0, and the non-snore audio feature file is labeled as 1.
Preferably, the one-dimensional convolution coding layer comprises a one-dimensional convolution network, a batch normalization layer and a nonlinear activation function ReLU layer which are connected in sequence.
Preferably, the step S5 specifically includes:
s51, judging whether sound exists in the environment through voice activity detection of the audio, if so, executing a step S52, otherwise, executing the step S51 again;
s52, setting a data storage queue, performing predictive scoring on input audio by using the snore detection model, if the scoring is greater than a threshold value, executing a step S53, otherwise, executing a step S54;
s53, marking the scoring result as 1 and adding the scoring result into a queue;
s54, marking the scoring result as-1 and adding the scoring result into a queue;
s55, judging whether the queue length is full, executing a step S56 if the queue length is full, and returning to the step S51 if the queue length is not full;
s56, calculating the sum of the marked numbers in the queue, if the sum is larger than 0, indicating that the snore is detected, otherwise, indicating that the snore is not detected.
The invention also discloses a robust snore detection system based on the convolutional neural network, which comprises a data generation module, an MFCC (multi-frequency-component communication) characteristic extraction module, a residual neural network module, a classifier module and a logic detection module which are connected in sequence;
the data generation module is used for constructing a snore data set comprising a far-field environment;
the MFCC feature extraction module is used for extracting the mel cepstrum coefficient features in the snore data set;
the residual convolutional neural network module comprises a one-dimensional convolutional coding layer, a plurality of one-dimensional residual excitation network layers, a characteristic fusion layer, an attention statistic pooling layer and a linear layer which are sequentially connected; the output ends of the one-dimensional convolution coding layers are connected with the input ends of the one-dimensional residual excitation network layers, and the output ends of the one-dimensional residual excitation network layers are connected with the input ends of the feature fusion layer;
the classifier module is used for judging whether the audio deep features output by the residual convolution neural network module correspond to snore features or not;
the logic detection module is used for detecting the environmental audio in real time by using the residual neutral network module.
Preferably, the classifier module is a SOFTMAX linear layer.
The snore detection method and system disclosed by the invention utilize the characteristic extraction of the lightweight residual neural network to construct a lightweight snore detection model with robustness, so that the detection rate of the current detection method under the middle-long distance is effectively improved.
Drawings
FIG. 1 is a diagram showing the overall data flow of the snore detecting device of the present invention;
FIG. 2 is a schematic diagram showing a specific implementation flow of the snore detecting method according to the present invention;
FIG. 3 is a schematic diagram of a residual neural network module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a specific workflow of the detection logic of the present invention.
Description of the embodiments
For a more intuitive and clear description of the technical solution of the present invention, the following detailed description will be given with reference to specific embodiments and example drawings.
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely explained below in connection with the detailed description of the present invention and the corresponding drawings, and it is obvious that the described embodiments are only some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the overall structure of the snore detecting device for implementing the snore detecting method of the present invention includes: the system comprises a data generation module, an MFCC (multi-frequency component) characteristic extraction module, a residual neural network module, a classifier module and a logic detection module.
The data generation module, the MFCC feature extraction module and the classifier module are mainly used in the training process of the residual neural network, and the residual neural network module and the logic detection module are used for carrying out snore recognition on external audio after training is completed.
The data generation module cuts and sorts the snore data into training samples of 3 seconds through a script, and expands the data in a mode of randomly adding reverberation and reducing volume.
The MFCC feature extraction module performs short-time Fourier on the signal to obtain a power spectrum, filters the power spectrum by using a Mel filter bank, and performs discrete cosine transform on the logarithm to obtain Mel cepstrum coefficient features serving as training samples.
The residual neural network module inputs the MFCC feature vector, and outputs the MFCC feature vector through residual neural network processing, and the final dimension of the output feature vector is [ batch_Size,192], wherein batch_Size is the number of training samples in each Batch.
The classifier module consists of a linear layer, features are subjected to nonlinear change in the linear layer, the association among the features is extracted, and finally the classification probability is mapped to an output space, so that the responsibility of the two classifiers is executed, and as an optional example, the snore is judged when the classification probability is greater than 0.53.
The logic detection module is mainly used for detecting snore in real time, and in order to further reduce false detection and improve accuracy, each detection result is stored in a queue, and when the number of detection results stored in the queue reaches a set value of 3, judgment is performed: and when the number of the snores is larger than the number of the non-snores in the detection result, the snores are considered to exist in the time period.
In order to better explain the specific scheme and advantages of the present invention, the following will further describe the details of the present invention in conjunction with the specific implementation procedure of the method, as shown in fig. 2, the snore detecting method of the present invention comprises the following steps:
s1, collecting and screening original snore data, aligning single data to a fixed length, randomly adding reverberation and reducing volume, and constructing a snore data set containing far-field environment.
The method comprises the steps that original audio data are provided with far-field audio through reverberation adding operation, the audio has reverberation from the auditory sense, a model is adapted to a requisite condition for detecting middle and long-distance snore, the model generalization capability can be stronger by training through the audio data with far-field characteristics, and the detection distance can be improved. Through adding reverberation operation to the audio, the audio can have certain far-field characteristics, the hearing sense is closer to snore transmitted from a long distance, the model is trained by using the data, the model can learn and adapt to the characteristics of far-field environmental audio, and compared with the model trained by using near-field audio, the snore under the long distance can be detected more easily.
In a specific embodiment, the original snore data is collected real recorded sleep snore, the snore audio is downloaded from a fresh website, relevant snore videos are obtained from a YouTube website, and the like, and the snore audio with poor quality, too small amplitude and unobvious snore characteristics is removed in a wav format, and the snore audio is arranged to obtain the 3.5-hour snore original data.
The removed original snore data is rearranged into a fixed audio with a fixed duration of 3 seconds, a room impact response audio is selected from a reverberation data set Meshrir randomly to add reverberation to expand the audio length, the volume of the snore audio can be reduced randomly, the volume can be reduced, the richness of training samples can be improved, the model can have a good detection rate on snore with smaller sound, and 20-hour data can be regenerated to serve as a snore data set for training.
Each piece of snore audio can be traversed first, and a plurality of room impact responses are randomly selected from the reverberation data set to carry out reverberation adding operation on the snore audio.
The original snore data set is enhanced by using different kinds of room impulse responses, reverberation and randomly changing volume, far-field audio is simulated and generated for training, model generalization performance is improved, and model robustness is enhanced.
In addition, the snore training data set further comprises non-snore audio, and as an optional example, common noise in a home environment is selected to be used as a negative sample for training, such as television sound, music sound, human sound, fan sound and the like.
This step may be accomplished by the data generation module.
S2, extracting the mel cepstrum coefficient (MFCC) characteristics in the snore data set as a training sample.
As an alternative embodiment, before training, extracting MFCC features and storing the extracted features as binary files with extension of ". Bin", labeling the binary files with features of snore audio being labeled as '0' and non-snore audio features as '1', so as to tell that the two-class model label is "0" and the label is "1" of the snore audio, and directly reading the binary files for training during training input, so as to accelerate the training process; let the input feature vector be f_mfcc, its dimension Size be [ batch_size, filters, T ], where Filters represents the number of mel Filters, filters=60, T represents the feature length, t=200.
One specific procedure for MFCC feature extraction is as follows, performing short-time fourier processing on audio to obtain a power spectrum, filtering the power spectrum with a mel filter bank, taking the logarithm of the filtered power spectrum, and performing discrete cosine transform to obtain mel-frequency cepstrum coefficient (MFCC) features, where the MFCC features are used as training samples in a subsequent step.
The mel cepstrum coefficient characteristic (Mel Frequency Cepstrum Coefficient, MFCC) is taken as a classical voice signal processing characteristic, can simulate the perception of human ears to a certain extent to sound, and is widely applied to the fields of voice recognition, speaker recognition, emotion recognition and the like, so that the invention utilizes the good distinguishing capability of the MFCC characteristic to be applied to snore detection, introduces a residual convolution neural network architecture ECAPA-TDNN as a characteristic extraction network and combines two classification networks to form a complete light snore detection model structure, and the model is taken as a light-weight high-efficiency level structure to meet the requirement of embedded equipment model deployment.
This step may be accomplished by the MFCC feature extraction module.
S3, taking the MFCC characteristics as input, referring to an ECAPA-TDNN (Emphasized Channel Attention, propagation and Aggregation in TDNN Based Speaker Verifification, speaker verification network based on TDNN emphasis channel attention propagation and aggregation) residual convolution neural network structure to extract audio deep characteristics.
As shown in fig. 3, the residual convolutional neural network comprises a one-dimensional convolutional coding layer, a plurality of one-dimensional residual excitation network layers, a feature fusion layer, an attention statistic pooling layer and a linear layer which are sequentially connected;
the output ends of the one-dimensional convolution coding layers are connected with the input ends of the one-dimensional residual excitation network layers, and the output ends of the one-dimensional residual excitation network layers are connected with the input ends of the feature fusion layer.
The one-dimensional convolution coding layer comprises a one-dimensional convolution network, a batch normalization layer and a nonlinear activation function ReLU layer, and as an alternative example, the number of input channels of the one-dimensional convolution network is the number Filters of Mel Filters, the number of output channels is C, and C=64.
The input characteristic vector F_mfcc is input to a one-dimensional convolution coding layer, the characteristic length of the F_mfcc is unchanged through convolution network processing, the channel number is changed to C, and the output vector is expressed as follows:
F_conv=BatchNorm(ReLU(Conv_1d(F_mfcc)))
wherein Conv_1d represents one-dimensional convolutional network feature mapping, reLU represents a nonlinear activation function, batchNorm represents Batch normalization operation, F_conv represents an output vector of a one-dimensional convolutional coding layer, and the dimension Size of the output vector is [ Batch_Size, C, T ].
The one-dimensional residual excitation network layer is composed of a residual network and a compression excitation network, the residual network uses a multi-scale level residual connection mode to divide an input feature into n smaller-scale features, n-1 convolution filters are allocated for weighting, the weighted features are subjected to residual connection one by one, and finally all small-scale features are spliced and restored to output features with the same input feature dimension. The residual error network uses a smaller convolution kernel for filtering, but increases the receptive field range in a multi-scale processing mode, and achieves a certain balance on calculation consumption and multi-scale feature extraction.
The compression excitation network can automatically weight the channel dimension of the feature, namely the channel number information, enhance useful information in the feature and inhibit useless information, and meanwhile, the parameter calculation is small and the calculation complexity is low. Specifically, the compression excitation network compresses the input feature dimension [ batch_size, C, T ] to [ batch_size, C,1], which is equivalent to expanding the field of view to the channel dimension, then uses the linear layer in the compression excitation network to conduct linear prediction on each channel of the compressed feature, and finally multiplies the input feature by the weight value obtained by the prediction, and the whole compression excitation process is completed.
The above mentioned compression Excitation networks belong to the prior art and are derived from the literature "squeze-and-Excitation networks" (Hu, j., shen, l., albanie, s., sun, g., & Wu, e. (2017). Squeze-and-specification networks IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2011-2023.)
The compression network excitation network firstly reduces the dimension of the input feature to obtain a feature weight vector of the dimension of the concerned channel, then multiplies the feature weight vector by the input vector, and restores the feature weight vector to the input feature dimension, and the whole process is still a feature weighting process.
Three one-dimensional residual excitation network layers are used for stacking and residual connection, and finally three weighted excitation characteristics are output and input into a characteristic fusion layer, wherein the three weighted excitation characteristics are specifically expressed as follows:
F_res1= ResBlock(F_conv)
F_res2= ResBlock(F_conv+ F_res1)
F_res3= ResBlock(F_conv+ F_res1+ F_res2)
wherein ResBlock represents the feature mapping of the one-dimensional residual excitation network layer, F_res1, F_res2 and F_res3 respectively represent the output vector of each one-dimensional residual excitation network layer, and the dimensions are [ batch_Size, C and T ].
The characteristic fusion layer is connected with the output ends of the three residual excitation network layers in a jumping way, the output ends of the three residual excitation network layers are used as the input ends of the characteristic fusion layer, the characteristic fusion layer splices the output vectors of the three one-dimensional residual excitation network layers according to the channel number of the output vectors, namely, three dimensions are weighted excitation characteristics F_res1, F_res2 and F_res3 of [ batch_Size, C and T ] and spliced into an output vector with one dimension of [ batch_Size,3C and T ] and then input into a convolutional neural network to finish characteristic fusion, and the characteristic fusion is specifically expressed as follows:
F_cat=Conv_1d(Cat(F_res1,F_res2,F_res3))
wherein Cat represents feature splicing operation according to channel dimension, conv_1d represents one-dimensional convolution network feature mapping, and the dimension of an output vector F_cat of the feature fusion layer is [ batch_Size,3C, T ].
The attention statistics pooling layer consists of two convolutional neural networks, wherein the two convolutional neural networks are identical except for different configurations of an input channel and an output channel, and feature information required before normalization is automatically learned through the two convolutional neural networks.
The attention statistics pooling layer may refer to the prior art "Attentive statistics pooling for deep speaker embedding" (Okabe, k., koshinaka, t., & Shinoda, k. (2018). Attentive Statistics Pooling for Deep Speaker eimbedding, interspeech.), and the method is specifically implemented by using two convolutional neural networks, the first convolutional neural network is to transform an input vector to other dimensions for weighting, and the second convolutional neural network is to restore a feature dimension.
The attention statistics pooling layer calculates the internal relation between vectors by using an attention mechanism, obtains the weight required by the pooling layer through normalization operation, performs weighting operation on the variance and standard deviation of input data, and increases the distinguishing degree of target features and non-target features, and is specifically expressed as follows:
F_attention=AttentionPool(F_cat)
wherein, attenationPool represents attention statistics pooling layer feature mapping, F_attention represents attention feature vector output by the attention statistics pooling layer, and the dimension is [ batch_Size,6C, T ].
The linear layer carries out linear transformation on the attention feature vector to finish feature dimension reduction processing to obtain a finally extracted feature embedded vector, which is specifically expressed as follows:
F_embedding=Linear(F_attention)
wherein Linear represents Linear layer feature mapping, F_embedding represents feature embedding vectors output by a Linear layer, F_embedding is the audio deep feature output by the step, and the dimension is [ batch_size,3C, T ].
S4, inputting the audio deep features obtained in the step S3 into a two-classifier to judge whether the audio deep features are target snores, performing iterative training on the residual convolution neural network by taking the input previous MFCC features as training targets, performing network parameter updating on the residual convolution neural network in the training process, stopping training after reaching convergence conditions, and storing the residual convolution neural network to obtain a snore detection model.
As shown in fig. 3, the deep audio features output through the linear layer of the residual convolutional neural network are input to the classifier implemented by a soft max linear layer with input dimension of 3C and output dimension of 2, the procedure is as follows:
[out1, out2] =Softmax(Final_Layer(Fembedding)
wherein final_layer () represents the feature map of the two classifiers, out1 represents the probability score of snore, out2 represents the probability score of non-snore, high-dimensional feature embedding vector F_emmbedding is converted into a two-dimensional vector through linear transformation, then the two-dimensional vector is processed by Softmax function to obtain target sound probability and non-target sound probability, and whether the snore belongs to the snore is judged through a threshold value.
S5, detecting the environmental audio in real time by using the trained snore detection model, and further judging the existence probability of the snore in the environment by adding a detection logic module to improve the robustness of the method and reduce the false detection condition.
Further, the judgment flow of the detection logic module is shown in fig. 4:
step S51, through voice activity detection on the audio stream, the MFCC characteristics of the input snore audio judge whether sound exists in the environment, if so, step S52 is executed, otherwise, step S51 is executed again.
Step S52, a data storage queue with a certain length is set, for example, the length is set to be 10, the snore detection model is used for carrying out predictive scoring on the input audio stream, if the scoring is larger than a threshold value, the step S53 is executed, and otherwise, the step S54 is executed.
Step S53, marking the scoring result as '1' and adding the scoring result into a queue, wherein the length of the queue is set to be 3 as an alternative example, namely, the scoring result is added for 3 times at most.
And S54, marking the scoring result as '1' and adding the scoring result into a queue.
Step S55, judging whether the queue length is full in 10 positions, if so, executing step S56, and if not, returning to step S51.
S56, calculating the sum of marked numbers in the queue, if the sum is larger than 0, indicating that snoring is detected, returning a result TURE, otherwise, indicating that snoring is not detected, and returning a result FALSE.
This step may be performed by the logic detection module.
Through the logic detection module, the stability of the system can be effectively improved, the false detection condition is reduced, specifically, because the queue length is required to be full to return the result, the model needs to comprehensively judge according to the detection result for a plurality of times, the system is more stable than a system which only detects the output result once, and meanwhile, the queue length can be set according to different requirements, so that the model meets the requirements of more users, for example, a long-time non-snore audio is input, if the detection queue length is set to be 1, the model can misjudge the snore as a snore only if the model is more than a threshold value for 1 time occasionally; if the detection queue length is set to 3, according to the detection logic, the model needs to have 2 times of detection more than the threshold value in the 3 times of detection to misjudge the model as snore, so that the phenomenon that the model occasionally generates 1 time of scoring errors and returns an error result is avoided to a great extent.
In order to verify the effectiveness of the snore detection method disclosed by the invention, in a specific embodiment, the snore detection method is tested by transplanting the snore detection method into embedded equipment to build an experimental environment, and in an actual natural environment, the automatic snore playing test is performed for about 20 minutes, and the snore detection accuracy of the snore detection method at distances of 0.5m, 1m, 2m and 3m is counted respectively, wherein the snore playing sound pressure level is about 55-65dB, and the test result is shown as follows:
distance/mDuration/minute of playTotal number of times/timesDetecting target number of times/timesAccuracy rate of
0.52016816095.24%
12017016295.29%
21915714491.72%
31915513788.39%
Wherein, within 1m, the detection accuracy is more than 95%, the distance is continuously increased to 2m, the detection accuracy is more than 90%, the accuracy is slightly reduced at 3m, and the accuracy is still 88.39%, so that the snore detection method disclosed by the invention has good effect under short distance, middle distance and long distance, and the effectiveness and stability of the method are verified.
The foregoing description of the preferred embodiments of the present invention is not obvious contradiction or on the premise of a certain preferred embodiment, but all the preferred embodiments can be used in any overlapped combination, and the embodiments and specific parameters in the embodiments are only for clearly describing the invention verification process of the inventor and are not intended to limit the scope of the invention, and the scope of the invention is still subject to the claims, and all equivalent structural changes made by applying the specification and the content of the drawings of the present invention are included in the scope of the invention.

Claims (8)

CN202310915532.7A2023-07-252023-07-25Robust snore detection method and system based on convolutional neural networkPendingCN116863956A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310915532.7ACN116863956A (en)2023-07-252023-07-25Robust snore detection method and system based on convolutional neural network

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310915532.7ACN116863956A (en)2023-07-252023-07-25Robust snore detection method and system based on convolutional neural network

Publications (1)

Publication NumberPublication Date
CN116863956Atrue CN116863956A (en)2023-10-10

Family

ID=88221493

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310915532.7APendingCN116863956A (en)2023-07-252023-07-25Robust snore detection method and system based on convolutional neural network

Country Status (1)

CountryLink
CN (1)CN116863956A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN118465305A (en)*2024-07-102024-08-09南京大学 Deep learning method and system for wind speed measurement based on surveillance camera audio data
CN118987423A (en)*2024-08-152024-11-22中南大学Automatic control method, device, computer equipment, readable storage medium and program product for breathing machine
CN119157494A (en)*2024-10-242024-12-20德沃康科技集团有限公司 Snoring detection method, device, electronic device and computer readable storage medium
CN120570572A (en)*2025-08-052025-09-02中南大学湘雅二医院Snore monitoring method and device for sleep apnea pathology discrimination

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN118465305A (en)*2024-07-102024-08-09南京大学 Deep learning method and system for wind speed measurement based on surveillance camera audio data
CN118987423A (en)*2024-08-152024-11-22中南大学Automatic control method, device, computer equipment, readable storage medium and program product for breathing machine
CN119157494A (en)*2024-10-242024-12-20德沃康科技集团有限公司 Snoring detection method, device, electronic device and computer readable storage medium
CN120570572A (en)*2025-08-052025-09-02中南大学湘雅二医院Snore monitoring method and device for sleep apnea pathology discrimination

Similar Documents

PublicationPublication DateTitle
CN110491416B (en)Telephone voice emotion analysis and identification method based on LSTM and SAE
CN116863956A (en)Robust snore detection method and system based on convolutional neural network
CN102800316B (en)Optimal codebook design method for voiceprint recognition system based on nerve network
CN109767785A (en) Environmental noise recognition and classification method based on convolutional neural network
CN101546556B (en)Classification system for identifying audio content
CN112885372A (en)Intelligent diagnosis method, system, terminal and medium for power equipment fault sound
CN109473120A (en) An abnormal sound signal recognition method based on convolutional neural network
CN117095694B (en)Bird song recognition method based on tag hierarchical structure attribute relationship
CN102436810A (en)Recording playback attack detection method and system based on channel mode noise
CN110047512A (en)A kind of ambient sound classification method, system and relevant apparatus
CN117789699B (en) Speech recognition method, device, electronic device and computer-readable storage medium
CN113936667A (en)Bird song recognition model training method, recognition method and storage medium
CN105448302A (en)Environment adaptive type voice reverberation elimination method and system
CN117976006A (en)Audio processing method, device, computer equipment and storage medium
Zhang et al.A novel insect sound recognition algorithm based on MFCC and CNN
CN119360872B (en) A method for voice enhancement and noise reduction based on generative adversarial network
CN108806725A (en)Speech differentiation method, apparatus, computer equipment and storage medium
CN119580783A (en) A voice activity detection method, system, terminal and storage medium
CN117198300B (en) A bird sound recognition method and device based on attention mechanism
CN113870896A (en)Motion sound false judgment method and device based on time-frequency graph and convolutional neural network
CN118522309A (en)Method and device for identifying noise sources along highway by using convolutional neural network
Ramalingam et al.IEEE FEMH voice data challenge 2018
CN117894304A (en)Distributed collaborative quality inspection method
CN113782051B (en)Broadcast effect classification method and system, electronic equipment and storage medium
KR102259299B1 (en)Book sound classification method using machine learning model of a book handling sounds

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp